Open beargiles opened 1 year ago
I forgot to add the few logs in /var/log/pki/pki-tomcat
localhost_access_log.2023-05-01.txt
35.92.229.238 - - [01/May/2023:09:56:41 -0700] "GET / HTTP/1.1" 302 -
35.92.229.238 - - [01/May/2023:09:56:41 -0700] "GET /pki HTTP/1.1" 302 -
35.92.229.238 - - [01/May/2023:09:56:55 -0700] "GET /pki/ HTTP/1.1" 200 3500
35.92.229.238 - - [01/May/2023:09:59:49 -0700] "GET /ca/admin/ca/getStatus HTTP/1.1" 200 119
35.92.229.238 - - [01/May/2023:09:59:49 -0700] "GET /ca/admin/ca/getStatus HTTP/1.1" 200 119
I've been unable to attach the ca/debug.log for some reason, and at 107 lines I would prefer to not inline it, but it only shows "INFO" and seems to succeed.
Finally the ca/signedAudit/ca_audit has
0.https-jsse-nio-8443-exec-1 - [01/May/2023:09:56:38 PDT] [14] [6] [AuditEvent=ACCESS_SESSION_ESTABLISH][ClientIP=--][ServerIP=--][SubjectID=--][Outcome=Success] access session establish success
0.https-jsse-nio-8443-exec-4 - [01/May/2023:09:56:57 PDT] [14] [6] [AuditEvent=ACCESS_SESSION_ESTABLISH][ClientIP=--][ServerIP=--][SubjectID=--][Outcome=Success] access session establish success
0.https-jsse-nio-8443-exec-5 - [01/May/2023:09:59:47 PDT] [14] [6] [AuditEvent=ACCESS_SESSION_ESTABLISH][ClientIP=--][ServerIP=--][SubjectID=--][Outcome=Success] access session establish success
0.https-jsse-nio-8443-exec-5 - [01/May/2023:09:59:50 PDT] [14] [6] [AuditEvent=ACCESS_SESSION_TERMINATED][ClientIP=--][ServerIP=--][SubjectID=--][Outcome=Success][Info=serverAlertSent: CLOSE_NOTIFY] access session terminated
0.https-jsse-nio-8443-exec-4 - [01/May/2023:09:59:50 PDT] [14] [6] [AuditEvent=ACCESS_SESSION_TERMINATED][ClientIP=--][ServerIP=--][SubjectID=--][Outcome=Success][Info=serverAlertSent: CLOSE_NOTIFY] access session terminated
Two oops!
The first is that I was trying to install ipaserver on Rocky Linux 9. That's not documented as a supported platform yet - but it's clearly very close to being usable.
The second is that I've been running my tests in several terminals and somehow overlooked that the most recent test (using Rocky Linux 8) hadn't set up the proper virtualenv. I don't think this would have affected the RL9 tests but I can double-check that.
Hmm... the run without the molecule virtualenv failed after about 10 minutes with an error message about timing out while waiting for sudo permissions. (!). It was in the context of connecting to the DBus.
I nuked the prior instance and am now creatng a new one while using the molecule virtualenv. I'm stuck at the same place - this time for 25+ minutes and counting.
Hello, do you have errors in the ipaserver-install.log file? The long time (to fail) smells like a DNS issue or a memory issue. Which ansible-freeipa version are you using? Please provide more information about the parameters for the server deployment. Are you configuring the DNS server?
(Requested information to follow - I have to step away for a meeting but wanted to provide the most recent information, and some context, first.)
For context this is a molecule test using a slightly modified ec2 driver from ansible-community/molecule-plugins. I wrote ansible scripts about 6 months ago that set up baseline AMI but didn't have time to write the molecule tests so I could close the JIRA tickets. Now that I have the time I'm doing a slight refactoring so that I'll be using an Ansible collection with specialized roles instead of a single role with nearly two dozen specialized tasks. This is partly for clarity, partly for potential security auditing.
I let the latest tests run overnight and after nearly 2 hours(!) I saw the error below. That reminded me of issues with LDAP only listening to the IPv6 address - it caused failures since the CA stores its keys in it. I vaguely recall changing a configuration property so I'll check that today.
An exception occurred during task execution. To see the full traceback, use -vvv. The error was: RuntimeError: Unable to retrieve CA chain: [Errno 111] Connection refused
,,,
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/rocky/.ansible/tmp/ansible-tmp-1682997104.0147376-527154-168251518675078/AnsiballZ_ipaserver_setup_ca.py", line 107, in <module>
_ansiballz_main()
File "/home/rocky/.ansible/tmp/ansible-tmp-1682997104.0147376-527154-168251518675078/AnsiballZ_ipaserver_setup_ca.py", line 99, in _ansiballz_main
invoke_module(zipped_mod, temp_path, ANSIBALLZ_PARAMS)
File "/home/rocky/.ansible/tmp/ansible-tmp-1682997104.0147376-527154-168251518675078/AnsiballZ_ipaserver_setup_ca.py", line 48, in invoke_module
run_name='__main__', alter_sys=True)
File "/usr/lib64/python3.6/runpy.py", line 205, in run_module
return _run_module_code(code, init_globals, run_name, mod_spec)
File "/usr/lib64/python3.6/runpy.py", line 96, in _run_module_code
mod_name, mod_spec, pkg_name, script_name)
File "/usr/lib64/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/tmp/ansible_freeipa.ansible_freeipa.ipaserver_setup_ca_payload_nqhe55id/ansible_freeipa.ansible_freeipa.ipaserver_setup_ca_payload.zip/ansible_collections/freeipa/ansible_freeipa/plugins/modules/ipaserver_setup_ca.py", line 417, in <module>
File "/tmp/ansible_freeipa.ansible_freeipa.ipaserver_setup_ca_payload_nqhe55id/ansible_freeipa.ansible_freeipa.ipaserver_setup_ca_payload.zip/ansible_collections/freeipa/ansible_freeipa/plugins/modules/ipaserver_setup_ca.py", line 379, in main
File "/usr/lib/python3.6/site-packages/ipaserver/install/ca.py", line 355, in install_step_0
pki_config_override=options.pki_config_override,
File "/usr/lib/python3.6/site-packages/ipaserver/install/cainstance.py", line 501, in configure_instance
self.start_creation(runtime=runtime)
File "/usr/lib/python3.6/site-packages/ipaserver/install/service.py", line 635, in start_creation
run_step(full_msg, method)
File "/usr/lib/python3.6/site-packages/ipaserver/install/service.py", line 621, in run_step
method()
File "/usr/lib/python3.6/site-packages/ipaserver/install/cainstance.py", line 851, in __request_ra_certificate
chain = self.__get_ca_chain()
File "/usr/lib/python3.6/site-packages/ipaserver/install/cainstance.py", line 804, in __get_ca_chain
raise RuntimeError("Unable to retrieve CA chain: %s" % str(e))
RuntimeError: Unable to retrieve CA chain: [Errno 111] Connection refused
That reminds me of earlier issues with the LDAP server only listening on the IPv6 port. I t
The 'waiting for privilege escalation prompt' is definitely unrelated since it only took 12s and the ansible host requires a sudo password.
Checking what has changed at this point:
It is perfectly fine that it only listens on IPv6 port. Please read man page for ipv6 to see how modern network stack works in Linux:
IPv4 connections can be handled with the v6 API by using the v4-mapped-on-v6 address type; thus a program needs to support only this API type to support both protocols. This is handled transparently by the address handling functions in the C library. IPv4 and IPv6 share the local port space. When you get an IPv4 connection or packet to an IPv6 socket, its source address will be mapped to v6.
I've already tried adding {{ ansible_hosts.all_ipv6_addresses }} to the list of IP addresses and it kicked it back since the only IPv6 address provided was the loopback. That doesn't mean the LDAP server won't be happy starting up - but since I can't provide that IP address in the settings then the CA may not know to try that address. Maybe.
It should be easy to modify the EC2 instance so it requests a IPv6 address and retry.
When I add '::ffff:{{ ansible_host.default_ipv6.address }}' I get
TASK [freeipa.ansible_freeipa.ipaserver : Install - Server preparation] ********
Tuesday 02 May 2023 12:34:26 -0600 (0:00:00.033) 0:02:22.010 ***********
Tuesday 02 May 2023 12:34:26 -0600 (0:00:00.033) 0:02:22.009 ***********
fatal: [molecule-test-freeipa]: FAILED! => changed=false
msg: 'Invalid IP Address ::ffff:10.42.73.190: cannot use IANA reserved IP address ::ffff:10.42.73.190'
which seems a little odd since it had no problem accepting the same IPv4 address.
The default IPv6 addresses are either the loopback (host) or link-local (fe80::) so I see a similar failure message, only this time because it's link-local scope.
Finally I created a new subnet that auto-assigns an IPv6 address out of a range managed by AWS - so it's 'global' scope. However I still see a hang at 'Setup CA'. I'm heading into another meeting so I can let it run for a while to see if an error message ever shows up.
FWIW the values I'm sending to the ipaserver role are
ok: [molecule-test-freeipa] =>
msg: |-
ipadm_password: DMPassword1
ipaadmin_principal: admin
ipaadmin_password: ADMPassword1
ipaserver: ip-10-42-73-190.us-west-2.compute.internal
ipaserver_ip_addresses: ['10.42.73.190', '2600:1f13:973:4700:eb7e:39d6:129e:f564']
ipaserver_domain: example.com
ipaserver_realm: EXAMPLE.COM
ipaserver_hostname: ip-10-42-73-190.us-west-2.compute.internal
ipaserver_no_host_dns: true
ipaserver_subject_base: dc=example,dc=com
ipaserver_ca_subject: cn=Certificate Authority,dc=example,dc=com
ipaserver_setup_dns: true
ipaserver_allow_zone_overlap: true
ipaserver_auto_forwarders: true
ipaserver_setup_firewalld: false
and the ipaserver, ipaserver_hostname, and first of the ipaserver_ip_addresses all match.
.....
Separately I just noticed that this test is still using the default instance size for other tests - that's wildly too small. I've bumped the instance size to 'medium'.
The script successfully completed with the explicit addition of a global IPv6 address to ipaserver_ip_addresses
.
It also succeeds if a global IPv6 address is available but not present in 'ipaserver_ip_addresses'.
Retrying it without a global IPv6 address available.
Grumble - I've backed out a ton of stuff and include_role
still completes. Even 'Rocky Linux 9' works!
At this point I think the only thing remaining is reverting the size of the instance and enabling the 'mem check' flag. I knew it tests more than just the available memory but didn't think to enable it.
I'll mark this closed in a moment but wanted to ask a question before submitting a ticket for it. I know that there are some significant differences between testing individual roles and ansible collections. It looks like the existing tests all use docker - which is fine for testing the ansible code itself.
However docker-based tests have a significant drawback - some platforms require a little more work. E.g., some services need to return an IP address for further work (e.g., an HDFS NameNode provides information about DataNodes) and EC2 instances don't know anything about their public IP address(es). You have to take a few extra steps.
Is it worth the effort to create an issue that provides my ec2-based test? It's not adding a lot - but it might be enough to save other people some effort when they're trying to deploy to EC2.
I'm consistently seeing this problem when running the ipaserver role for the first time. It takes several minutes to fail - timeout?
It appears to succeed on the second try. ("Appears" since I haven't tested it.)
I have a ton of additional documentation but I just remembered that I already have a task that resets the CA in addition removing the ipaserver role. I suspect at least part of the problem is race condition and a missing directory or file.