aws / aws-nitro-enclaves-acm

AWS Certificate Manager for Nitro Enclaves allows the use of public and private SSL/TLS certificates with web applications and web servers running on Amazon EC2 instances with AWS Nitro Enclaves.
Apache License 2.0
76 stars 30 forks source link

1.1.0 broke Nginx support #53

Closed urluba closed 2 years ago

urluba commented 2 years ago

Hi,

Running Amazon Linux 2 on a m6a, I am unable to use version 1.1.0-1 of the RPM with Nginx.

systemctl status nginx -l tells me he can't load the private key:

Nginx is unable to load 
Dec 29 12:45:50 i-013dfb51ffc42f1ce nginx[5758]: nginx: [emerg] cannot load certificate key "engine:pkcs11:pkcs11:model=p11ne-token;manufacturer=Amazon;token=nginx-acm-internal;id=%01;object=acm-key;type=private?pin-value=123456": ENGINE_load_private_key() failed (SSL: error:80067065:pkcs11 engine:ctx_load_privkey:object not found error:26096080:engine routines:ENGINE_load_private_key:failed loading private key)

The stanza file is:

ssl_certificate_key "engine:pkcs11:pkcs11:model=p11ne-token;manufacturer=Amazon;token=nginx-acm-internal;id=%01;object=acm-key;type=private?pin-value=123456";
ssl_certificate "/run/nitro_enclaves/acm/nginx-cert-123456.pem";

Certificate located in run/nitro_enclaves/acm/nginx-cert-... is ok

And config file:

# Copyright 2020 Amazon.com, Inc. or its affiliates. All Rights Reserved.
# SPDX-License-Identifier: Apache-2.0
---
enclave:
  cpu_count: 2
  memory_mib: 256
options:
  nginx_force_start: false
  nginx_reload_wait_ms: 20000
tokens:
  - label: nginx-acm-internal
    source:
      Acm:
        certificate_arn: "arn:aws:acm:earth-616:123456:certificate/123456"
    target:
      NginxStanza:
        path: /etc/nginx/acm/internal.conf
        user: nginx

Error happens with both an upgrade or a clean install. A rollback to RPM 1.0.2 solves the issue.

alcioa commented 2 years ago

@urluba Hi. The 1.1.0 RPM introduced Java Keystore support, meaning we also store the certificate chain associated with the provisioned private key in order to meet Java pkcs#11 provider requirements for supporting SSL/TLS server certificates.

Your NGINX error shows that the token private-key provisioning failed it seems.

I know NGINX does not care in this context if the certificate chain is in the enclave or on the instance from pkcs#11 perspective. Now we push the optional certificate chain along the private key in the managed token and validate the certificate chain. So, one failure that could rise here is that we fail certificate validation during private key provisioning (certificate chain is invalid). I unfortunately failed to reproduce this with my ACM private certificate or non-ACM certificate chains.

Could you please provide some context? Is this standard use-case as per docs with ACM certs? Also what does journalctl -u nitro-enclaves-acm.service show?

urluba commented 2 years ago

Hello @alcioa ,

Yes, the setup is based on this documentation. As I said it is fully working with the version 1.0.2. Certificates I am using are from a private PKI and the full chain is imported into ACM. I am not sure, I'll be authorized to post it here so feel free to ask me some technical infos.

[Edit] I've also tried with a wildcard Amazon certificate generated from ACM and have the same behavior

I've launched a clean instance using 1.0.2:

[root@i-123456 bin]# journalctl -u nitro-enclaves-acm.service
-- Logs begin at Mon 2022-01-03 05:55:33 UTC, end at Fri 2022-01-07 12:57:07 UTC. --
Jan 07 12:43:48 i-123456 systemd[1]: Starting Nitro Enclaves ACM Agent...
Jan 07 12:43:48 i-123456 systemd[1]: Started Nitro Enclaves ACM Agent.
Jan 07 12:43:49 i-123456 p11ne-agent[5203]: |INFO  | Setting up p11-kit config
Jan 07 12:43:49 i-123456 p11ne-agent[5203]: |INFO  | Restarting vsock proxy
Jan 07 12:43:50 i-123456 p11ne-agent[5203]: |INFO  | Syncing token nginx-acm-internal
Jan 07 12:43:52 i-123456 p11ne-agent[5203]: |INFO  | Reloading NGINX config
Jan 07 12:43:52 i-123456 p11ne-agent[5203]: |WARN  | Unable to reload NGINX: it isn't running and force starting is disabled
Jan 07 12:53:51 i-123456 p11ne-agent[5203]: |INFO  | Syncing token nginx-acm-internal
Jan 07 12:53:52 i-123456 p11ne-agent[5203]: |INFO  | Refreshing token nginx-acm-internal
Jan 07 12:53:52 i-123456 p11ne-agent[5203]: |INFO  | Reloading NGINX config
[root@i-123456 bin]# journalctl -u nginx
-- Logs begin at Mon 2022-01-03 05:55:33 UTC, end at Fri 2022-01-07 12:57:07 UTC. --
Jan 07 12:44:07 i-123456 systemd[1]: Starting NGINX Plus - high performance web server...
Jan 07 12:44:07 i-123456 su[5450]: (to nginx) root on none
[...]
Jan 07 12:49:53 i-123456 systemd[1]: Started NGINX Plus - high performance web server.
Jan 07 12:53:52 i-123456 su[5666]: (to nginx) root on none

Let's do the update:

[root@i-123456 bin]# yum update -y aws-nitro-enclaves-cli aws-nitro-enclaves-acm
[...]
Updated:
  aws-nitro-enclaves-acm.x86_64 0:1.1.0-1.amzn2

Complete!
[root@i-123456 bin]# systemctl restart nitro-enclaves-acm
[root@i-123456 bin]# journalctl -u nitro-enclaves-acm.service
-- Logs begin at Mon 2022-01-03 05:55:33 UTC, end at Fri 2022-01-07 13:03:09 UTC. --
Jan 07 12:43:48 i-123456 systemd[1]: Starting Nitro Enclaves ACM Agent...
Jan 07 12:43:48 i-123456 systemd[1]: Started Nitro Enclaves ACM Agent.
Jan 07 12:43:49 i-123456 p11ne-agent[5203]: |INFO  | Setting up p11-kit config
Jan 07 12:43:49 i-123456 p11ne-agent[5203]: |INFO  | Restarting vsock proxy
Jan 07 12:43:50 i-123456 p11ne-agent[5203]: |INFO  | Syncing token nginx-acm-internal
Jan 07 12:43:52 i-123456 p11ne-agent[5203]: |INFO  | Reloading NGINX config
Jan 07 12:43:52 i-123456 p11ne-agent[5203]: |WARN  | Unable to reload NGINX: it isn't running and force starting is disabled
Jan 07 12:53:51 i-123456 p11ne-agent[5203]: |INFO  | Syncing token nginx-acm-internal
Jan 07 12:53:52 i-123456 p11ne-agent[5203]: |INFO  | Refreshing token nginx-acm-internal
Jan 07 12:53:52 i-123456 p11ne-agent[5203]: |INFO  | Reloading NGINX config
Jan 07 13:03:05 i-123456 systemd[1]: Stopping Nitro Enclaves ACM Agent...
Jan 07 13:03:05 i-123456 p11ne-agent[5203]: |INFO  | Setting exit condition
Jan 07 13:03:05 i-123456 p11ne-agent[5203]: |INFO  | Killing enclave pid=5208
Jan 07 13:03:05 i-123456 p11ne-agent[5203]: |INFO  | Cleaning up p11kit config
Jan 07 13:03:06 i-123456 systemd[1]: Stopped Nitro Enclaves ACM Agent.
Jan 07 13:03:06 i-123456 systemd[1]: Starting Nitro Enclaves ACM Agent...
Jan 07 13:03:06 i-123456 systemd[1]: Started Nitro Enclaves ACM Agent.
Jan 07 13:03:06 i-123456 p11ne-agent[5988]: |INFO  | Setting up p11-kit config
Jan 07 13:03:06 i-123456 p11ne-agent[5988]: |INFO  | Restarting vsock proxy
Jan 07 13:03:07 i-123456 p11ne-agent[5988]: |INFO  | Syncing token nginx-acm-internal
Jan 07 13:03:09 i-123456 p11ne-agent[5988]: |INFO  | Reloading NGINX config
[root@i-123456 bin]# systemctl restart nginx
Job for nginx.service failed because the control process exited with error code. See "systemctl status nginx.service" and "journalctl -xe" for details.
[root@i-123456 bin]# journalctl -u nginx
-- Logs begin at Mon 2022-01-03 05:55:33 UTC, end at Fri 2022-01-07 13:03:34 UTC. --
Jan 07 12:44:07 i-123456 systemd[1]: Starting NGINX Plus - high performance web server...
Jan 07 12:44:07 i-123456 su[5450]: (to nginx) root on none
[...]
Jan 07 12:49:53 i-123456 systemd[1]: Started NGINX Plus - high performance web server.
Jan 07 12:53:52 i-123456 su[5666]: (to nginx) root on none
Jan 07 13:03:34 i-123456 systemd[1]: Stopping NGINX Plus - high performance web server...
Jan 07 13:03:34 i-123456 systemd[1]: Stopped NGINX Plus - high performance web server.
Jan 07 13:03:34 i-123456 systemd[1]: Starting NGINX Plus - high performance web server...
Jan 07 13:03:34 i-123456 nginx[6138]: Failed to enumerate slots
Jan 07 13:03:34 i-123456 nginx[6138]: Failed to enumerate slots
Jan 07 13:03:34 i-123456 nginx[6138]: PKCS11_get_private_key returned NULL
Jan 07 13:03:34 i-123456 nginx[6138]: nginx: [emerg] cannot load certificate key "engine:pkcs11:pkcs11:model=p11ne-token;manufacturer=Amazon;token=nginx-acm-internal;id=%01;object=a
Jan 07 13:03:34 i-123456 systemd[1]: nginx.service: control process exited, code=exited status=1
Jan 07 13:03:34 i-123456 systemd[1]: Failed to start NGINX Plus - high performance web server.
Jan 07 13:03:34 i-123456 systemd[1]: Unit nginx.service entered failed state.
Jan 07 13:03:34 i-123456 systemd[1]: nginx.service failed.
urluba commented 2 years ago

I tried with openssl, got the same type of error.

With 1.0.2, I can use the private key to make a CSR:

openssl req -engine pkcs11 -new -key "pkcs11:model=p11ne-token;pin-value=123456" -keyform engine -out /tmp/req.csrengine "pkcs11" set.
You are about to be asked to enter information that will be incorporated
into your certificate request.
What you are about to enter is what is called a Distinguished Name or a DN.
There are quite a few fields but you can leave some blank
For some fields there will be a default value,
If you enter '.', the field will be left blank.
-----
[...]

After the upgrade, the same command fails:

openssl req -engine pkcs11 -new -key "pkcs11:model=p11ne-token;pin-value=13456" -keyform engine -out /tmp/req.csr
Failed to enumerate slots
can't use that engine
139694939154336:error:260B806D:engine routines:ENGINE_TABLE_REGISTER:init failed:eng_table.c:175:
no engine specified
unable to load Private Key

Edit: Made a strace. Here is the end of the call where you can see that the read operation on the socket return 0:

socket(AF_VSOCK, SOCK_STREAM, 0)        = 3
connect(3, {sa_family=AF_VSOCK, sa_data="\0\0\17'\0\0\21\0\0\0\0\0\0\0"}, 16) = 0
write(3, ""..., 1)                      = 1
write(3, ""..., 12)                     = 12
write(3, ""..., 5)                      = 5
write(3, ""..., 66)                     = 66
read(3, "", 1)                          = 0
close(3)                                = 0
write(2, ""..., 26Failed to enumerate slots
)                     = 26
munmap(0x7fd05cb21000, 3273024)         = 0
munmap(0x7fd05c919000, 2127400)         = 0
munmap(0x7fd05c6e0000, 2328992)         = 0
munmap(0x7fd05c4cd000, 2172040)         = 0
write(2, ""..., 22can't use that engine
)                     = 22
write(2, ""..., 98140532931254176:error:260B806D:engine routines:ENGINE_TABLE_REGISTER:init failed:eng_table.c:175:
)                     = 98
write(2, ""..., 20no engine specified
)                     = 20
write(2, ""..., 27unable to load Private Key
)                     = 27
munmap(0x7fd05ce41000, 2178928)         = 0
exit_group(1)                           = ?
+++ exited with 1 +++

Also p11-kit list-modules outputs:

alcioa commented 2 years ago

@urluba what you've shown is the symptom as NGINX does not start if the private key is not available in the token. I see that the nginx_force_start: false parameter gives the same message on 1.0.2 also when starting the ACM managed token for the first time.

So it might seem that the underlying problem in 1.1.0 is that your certificate chain fails validation in the token, thus causing the private key provisioning to fail.

NOTE: If we decouple NGINX from this issue, systemctl start nitro-enclaves-acm.service and sudo yum install -y gnutls-utils && p11tool --list-all <token> shall show you the private key + cert objects if provisioning was successful.

I tested with an ACM wildcard private certificate on my m6a instance with your acm.yaml config and starting the nitro-enclaves-acm.service and then the nginx.service works for me and I can see the certificates provisioned. My chain has two certificates.

Does your certificate chain PEM file have this structure presented in the comment here? I know ACM also enforces this order as per RFC. Are there any special flags for your ACM certificate used for testing earlier?

Does this happen only on m6a instances?

urluba commented 2 years ago

Hello @alcioa,

Indeed my issue has nothing to see with Nginx. I've tried the p11tool command line with a widcard certificate from ACM.

With 1.0.2:

[root@i-123456 acm]# p11tool --list-all
warning: no token URL was provided for this operation; the available tokens are:
Token 0: pkcs11:model=p11-kit-trust;manufacturer=PKCS%2311%20Kit;serial=1;token=System%20Trust
Token 1: pkcs11:model=p11-kit-trust;manufacturer=PKCS%2311%20Kit;serial=1;token=Default%20Trust
Token 2: pkcs11:model=p11ne-token;manufacturer=Amazon;serial=EVT00;token=nginx-acm-internal

[root@i-123456 acm]# p11tool --list-all "pkcs11:model=p11ne-token;manufacturer=Amazon;serial=EVT00;token=nginx-acm-internal"
Object 0:
        URL: pkcs11:model=p11ne-token;manufacturer=Amazon;serial=EVT00;token=nginx-acm-internal;id=%01;object=acm-key;type=public
        Type: Public key
        Label: acm-key
        ID: 01

After the update:

[root@i-123456 acm]# p11tool --list-all
warning: no token URL was provided for this operation; the available tokens are:
Token 0: pkcs11:model=p11-kit-trust;manufacturer=PKCS%2311%20Kit;serial=1;token=System%20Trust
Token 1: pkcs11:model=p11-kit-trust;manufacturer=PKCS%2311%20Kit;serial=1;token=Default%20Trust

[root@i-123456 acm]# p11tool --list-all "pkcs11:model=p11ne-token;manufacturer=Amazon;token=nginx-acm-internal;id=%01;object=acm-key;type=private?pin-value=2301331af2d9850d9eb9b7de621bd312"
No matching objects found

If the output show nothing about my certificate, the PEM file is generated under /run/... and the order is good:

  1. subject= /CN=*.myweb.com issuer= /C=US/O=Amazon/OU=Server CA 1B/CN=Amazon

  2. subject= /C=US/O=Amazon/OU=Server CA 1B/CN=Amazon issuer= /C=US/O=Amazon/CN=Amazon Root CA 1

  3. subject= /C=US/O=Amazon/CN=Amazon Root CA 1 issuer= /C=US/ST=Arizona/L=Scottsdale/O=Starfield Technologies, Inc./CN=Starfield Services Root Certificate Authority - G2

  4. subject= /C=US/ST=Arizona/L=Scottsdale/O=Starfield Technologies, Inc./CN=Starfield Services Root Certificate Authority - G2 issuer= /C=US/O=Starfield Technologies, Inc./OU=Starfield Class 2 Certification Authority

I can try with other instance type or another region if you think it's relevant. I am not aware of any special flags. I'll investigate with other certificates (ACM or imported).

I'll keep you in touch

urluba commented 2 years ago

Hello again @alcioa,

I think we are close 😄 After some tests with differents certificates, it appears that the problem with the 1.1.0 is when I am using Amazon PKI...

If I use only imported certificates from our home made PKI, it's working fine under 1.1.0.

If I use only certificates generated from ACM, none appears under 1.1.0.

If I mix ACM generated certificates and imported ones, none appears under 1.1.0. Removing Amazon ones from acm.yaml + restart make the imported ones appears.

I think some Amazon CA are missing from the trust store thus preventing the validation as you suspected. I tried to update /etc/pki/ca-trust/source/anchors/ (sudo update-ca-trust enable; sudo update-ca-trust extract) without success. Could you tell me where the enclaves search them ?

alcioa commented 2 years ago

Hi @urluba

The certificate validation is done offline in the enclave based only on the /run/nitro_enclaves/acm/nginx-cert-*.pem cert chain file ingested when the nitro-enclaves-acm.service starts.

In my tests I have ACM generated cert with the server certificate + my private ACM CA, thus why it works on my side.

So it seems that your 4th certificate in that chain is not the root CA as it has different subject and issuer (and likely not self-signed) and that would explain the failure as backend crypto considers this an incomplete chain of trusted certificates.

Will need to patch this in order to allow partial chain verification (since the first intermediate certificate is considered trusted anyhow) because in the current paradigm the enclave should not make too many assumptions on what user managed certificates hierarchy looks like as long as they are valid.

If possible, you can edit your nginx-cert-*.pem with a certificate chain that also has the root CA and restart the nitro-enclaves-acm.service. You should see the token provisioned.

alcioa commented 2 years ago

Merged. Shall be included in the next RPM release.

aleksy-zalenski commented 2 years ago

Hello,

Just FYI I am seeing the exact same issue after updating to version 1.1.0 of nitro-enclaves-acm and using AWS CA certificates. On version 1.0.2 things work as expected.

It would be good to release the fix soon, as this breaks one of the most important feature of this tool. Thanks!

alcioa commented 2 years ago

@urluba @aleksy-zalenski The aws-nitro-enclaves-acm-1.1.1 package has been released.

urluba commented 2 years ago

Hello @alcioa ,

I've updated a first stack with the new package and it seems to be working.

Thanks for the good work 😄