airshipit / treasuremap

Reference Airship manifests, CICD, and reference architecture.
http://openstack.org
Apache License 2.0
52 stars 39 forks source link

Secure connection to drydock is failing. #212

Closed nagajagan closed 2 years ago

nagajagan commented 2 years ago

Describe the bug Installing Drydock Boot Actions.start is failing.

Steps To Reproduce Maintain treasurmap version @ https://github.com/airshipit/treasuremap/commit/2227df4a8d60581974f49501265c0b8230fbf414 and follow the steps to bring up genesis node.

Expected behavior Drydock should complete deployment of nodes.

Environment

Detailed logs within drydock `Installing Drydock Boot Actions.start: cmd-install/stage-late/drydock_01/cmd-in-target: curtin command in-target

Running command ['mount', '--bind', '/dev', '/tmp/tmpt3f8gvqn/target/dev'] with allowed return codes [0] (capture=False) Running command ['mount', '--bind', '/proc', '/tmp/tmpt3f8gvqn/target/proc'] with allowed return codes [0] (capture=False) Running command ['mount', '--bind', '/run', '/tmp/tmpt3f8gvqn/target/run'] with allowed return codes [0] (capture=False) Running command ['mount', '--bind', '/sys', '/tmp/tmpt3f8gvqn/target/sys'] with allowed return codes [0] (capture=False)

Running command ['unshare', '--help'] with allowed return codes [0] (capture=True)Running command ['unshare', '--fork', '--pid', '--', 'chroot', '/tmp/tmpt3f8gvqn/target', 'wget', '--no-proxy', '--no-check-certificate', '--header=X-Bootaction-Key: e27bba27178686a0112252ab215042a4a85a3aa76978be5b2d3cba845c770491', 'https://drydock-nc.att-5gcore.bete.ericy.com/api/v1.0/bootactions/nodes/att5gc19/units', '-O', '/tmp/bootaction-units.tar.gz'] with allowed return codes [0] (capture=False)

--2022-04-07 14:47:04-- https://drydock-nc.att-5gcore.bete.ericy.com/api/v1.0/bootactions/nodes/att5gc19/units

Resolving drydock-nc.att-5gcore.bete.ericy.com (drydock-nc.att-5gcore.bete.ericy.com)... 10.109.82.10

Connecting to drydock-nc.att-5gcore.bete.ericy.com (drydock-nc.att-5gcore.bete.ericy.com)|10.109.82.10|:443... connected. WARNING: cannot verify drydock-nc.att-5gcore.bete.ericy.com's certificate, issued by ‘CN=Kubernetes Ingress Controller Fake Certificate,O=Acme Co’:

Unable to locally verify the issuer's authority.WARNING: no certificate subject alternative name matches
requested host name ‘drydock-nc.att-5gcore.bete.ericy.com’.HTTP request sent, awaiting response... 404 Not Found

2022-04-07 14:47:04 ERROR 404: Not Found.Running command ['udevadm', 'settle'] with allowed return codes [0] (capture=False)TIMED subp(['udevadm', 'settle']): 0.010`

jasvinder1107 commented 2 years ago

If you see from the error itself, the ingress is taking the FAKE certs, which essentially means that cert were not generated by promenade while installation was done. If ingress is not provided with the valid internal certs generated by below command, the fqdn of ingress will resolve to fake cert and installation will not behave as expected.

mkdir ${NEW_SITE}_certs sudo tools/airship promenade generate-certs \ -o /target/${NEW_SITE}_certs /target/${NEW_SITE}_collected/*.yaml

mkdir -p site/${NEW_SITE}/secrets/certificates sudo cp ${NEW_SITE}_certs/certificates.yaml \ site/${NEW_SITE}/secrets/certificates/certificates.yaml

nagajagan commented 2 years ago

site/xxxxx/secrets/certificates/ingress.yaml, ingress-crt-site to have following content and that should solve the problem.

-----BEGIN CERTIFICATE----- Ingress Certificates -----END CERTIFICATE----- -----BEGIN CERTIFICATE----- Intermediate Certificate -----END CERTIFICATE----- -----BEGIN CERTIFICATE----- Root certificate -----END CERTIFICATE-----

jasvinder1107 commented 2 years ago

Just to clarify for future audience. The cert chain is required to be installed in the ingress.yaml. If not properly installed, the call from client to ingress is going to fail with ssl code 21. The error means couldn’t verify the certificate. Please check for public certs in the ingress definition for corresponding services.

nagajagan commented 2 years ago

Including certificate chain in the ingress.yaml didn't solve the problem of drydock connectivity. It only solved the shipyard connectivity problem.

jasvinder1107 commented 2 years ago

The dns for drydock should resolve to ingress-nc not ingress-uc starting from 2.7. Please correct the dns entry and you should be able to fix this thing.

nagajagan commented 2 years ago

45c6953d-c0df-42e0-9e60-c75df5e88186

After pointing drydock-nc to ingress-nc that is the issue we observe on controller IDRAC consoles while PXE booting. That is not caused by firewall. What default routes do you suggest to change?

nagajagan commented 2 years ago

Logs from from MaaS GUI

Stdout: start: cmd-install/stage-late/drydock_02/cmd-in-target: curtin command in-target
        Running command ['mount', '--bind', '/dev', '/tmp/tmpdi62xy0q/target/dev'] with allowed return codes [0] (capture=False)
        Running command ['mount', '--bind', '/proc', '/tmp/tmpdi62xy0q/target/proc'] with allowed return codes [0] (capture=False)
        Running command ['mount', '--bind', '/run', '/tmp/tmpdi62xy0q/target/run'] with allowed return codes [0] (capture=False)
        Running command ['mount', '--bind', '/sys', '/tmp/tmpdi62xy0q/target/sys'] with allowed return codes [0] (capture=False)
        Running command ['unshare', '--help'] with allowed return codes [0] (capture=True)
        Running command ['unshare', '--fork', '--pid', '--', 'chroot', '/tmp/tmpdi62xy0q/target', 'wget', '--no-proxy', '--no-check-certificate', '--header=X-Bootaction-Key: ae631ad31b0bdbe53601f4da35375040bac0bc446a245858f2b33d759ae101df', 'https://drydock-nc.att-5gcore.bete.ericy.com/api/v1.0/bootactions/nodes/att5gc18/files', '-O', '/tmp/bootaction-files.tar.gz'] with allowed return codes [0] (capture=False)
        --2022-05-10 17:10:52--  https://drydock-nc.att-5gcore.bete.ericy.com/api/v1.0/bootactions/nodes/att5gc18/files
        Resolving drydock-nc.att-5gcore.bete.ericy.com (drydock-nc.att-5gcore.bete.ericy.com)... 10.109.84.189
        Connecting to drydock-nc.att-5gcore.bete.ericy.com (drydock-nc.att-5gcore.bete.ericy.com)|10.109.84.189|:443... connected.
        HTTP request sent, awaiting response... 500 Internal Server Error
        2022-05-10 17:12:29 ERROR 500: Internal Server Error.       

        Running command ['udevadm', 'settle'] with allowed return codes [0] (capture=False)
        TIMED subp(['udevadm', 'settle']): 0.010
        Running command ['umount', '/tmp/tmpdi62xy0q/target/sys'] with allowed return codes [0] (capture=False)
        Running command ['umount', '/tmp/tmpdi62xy0q/target/run'] with allowed return codes [0] (capture=False)
        Running command ['umount', '/tmp/tmpdi62xy0q/target/proc'] with allowed return codes [0] (capture=False
        Running command ['umount', '/tmp/tmpdi62xy0q/target/dev'] with allowed return codes [0] (capture=False)
        finish: cmd-install/stage-late/drydock_02/cmd-in-target: FAIL: curtin command in-target

Stderr: ''

Same service called from curl

root@att5gc20:~# curl --header "X-Bootaction-Key: ae631ad31b0bdbe53601f4da35375040bac0bc446a245858f2b33d759ae101df" https://drydock-nc.att-5gcore.bete.ericy.com/api/v1.0/bootactions/nodes/att5gc18/files
{"title": "Error when running bootaction pipeline segment utf8_decode: AttributeError - 'NoneType' object has no attribute 'decode'"}

We don't see any logging information within drydock pods to find the root cause of this issue.

nagajagan commented 2 years ago

Initial issue is fixed by adding proper routings in the environment.

https://github.com/airshipit/treasuremap/issues/212#issuecomment-1126219216 is addressed by with the right version of the image for promenade and tested by the reporter.

       promenade:
         location: https://opendev.org/airship/promenade
-        reference: 27f181a9d30294030d695b747b2e4560ffbd29be
+        reference: d161528ae8de0dcb0dd9d39bc370f85f2aa1c462
         subpath: charts/promenade
         type: git