Netflix / lemur

Repository for the Lemur Certificate Manager
Apache License 2.0
1.71k stars 322 forks source link

Use ACME with existing account #3669

Open 21stcaveman opened 3 years ago

21stcaveman commented 3 years ago

I'm trying to setup lemur to work with Digicert, via ACME protocol. Running into issues which I am guessing have to do with ACME_REGR and ACME_PRIVATE_KEY variables. I am aware that ACME_PRIVATE_KEY needs to be in JWK format, and ACME_REGR needs to be like : {"body": {}, "uri": "..."}

However, it is extremely unclear to me how to properly configure these?! I have a URL, a KID and an HMAC_KEY from Digicert. Do I have to add KID at the end of REGR URL?! like {"body": {}, "uri": "https:/acme.blah.com/acme/KID"} ? Or should I create a JWK object with both KID and HMAC_KEY in them? How can I create the JWK from these two? I have tried jwk.JWK.from_json('{"kid":"KID", "k":"HMAC_KEY", "kty":"oct"}'), with no luck. Cert creation fails with :

2021-07-09 10:56:44,771 ERROR: Unable to resolve pending cert: <PendingCertificate 12> [in /.../lemur/plugins/lemur_acme/plugin.py:184] Traceback (most recent call last): File "/var/www/lemur/lemur/plugins/lemur_acme/plugin.py", line 153, in get_ordered_certificates order = acme_client.new_order(pending_cert.csr) ... File "/var/www/lemur/lib/python3.7/site-packages/josepy/jws.py", line 208, in sign assert isinstance(key, alg.kty) AssertionError

Any pointers on how to do this?! The documentation does not really help!

21stcaveman commented 3 years ago

I have also tried : ACME_REGR = '{"body": {}, "uri": "https:/acme.blah.com/acme/KID"}' ACME_PRIVATE_KEY = jwk.JWK.from_password('HMAC_KEY')

With no luck, same error.

hosseinsh commented 3 years ago

Hi @21stcaveman,

I am not super familiar with the DigiCert ACME integration, and this the External Account binding feature is relatively new in Lemur.

If you already have the ACME private key, it should be as simple as setting the values in this format as mentioned here

ACME_REGR = {"body": {}, "uri": "https://..”}


ACME_PRIVATE_KEY  ={"n": “REDACTED”,
"e": "REDACTED", 
"d": "REDACTED", 
"p": "REDACTED", 
"q": "REDACTED", 
"dp": "REDACTED", 
"dq": "REDACTED",
"qi": "REDACTED", 
"kty": "RSA"}

If only have EAB_HMAC_KEY and EAB_KID, the idea is that Lemur will swap that for ACME_PRIVATE_KEY the first time you call the ACME server and persist it in the DB, as longs you have Store_account enabled. So next time it will use the ACME_PRIVATE_KEY.

EAB

Some ACME Servers would like to you to use certbot to register and get the ACME_PRIVATE_KEY, so you can agree to their terms during the initial setup:

certbot register \
           --email “YOUR_EMAIL” \
           --no-eff-email \
           --server "https://HOSTNAME/directory" \
           --eab-kid “REDACTED” \
           --eab-hmac-key “REDACTED” \
       --config-dir ~/.certbot/config --logs-dir ~/.certbot/logs --work-dir ~/.certbot/work

Please let us know how it goes. We certainly need to improve the docs around external account binding

21stcaveman commented 3 years ago

Thank you for your response, Hossein.

I only have the kid and hmac key, no private key. I can certainly try to get the key with certbot. However, I would prefer to use the EAB account method.

In your screenshot, I can see eab_kid, eab_hmac_key, acme_private_key and acme_regr fields. In my GUI however, I do not have those fields when I try to create an ACME authority, which is why I was so confused about this process. I have been setting acme_regr and acme_hmac_key in the config file! I am using version 0.10.0, is there something I need to set in the config file in order for those fields to show up?

P.S. I stand corrected. I was running 0.9.0. After upgrade to 0.10.0, now I can see those fields in GUI.

21stcaveman commented 3 years ago

I have re-created the authority after upgrade, and it seems like that went through well. Trying to create a certificate however, it errors out with : Schema - {"rule":["Validity end must not land on a weekend."]}

Now, it won't let me change the validity. It seems like when selecting an ACME plugin, lemur by default thinks the provider will be letsencrypt! in front of validity, I see: 'Certificates for LetsEncrypt expire 90 days after creation. Enable auto-rotate to have Lemur automatically rotate the certificate and update your endpoints.'

Is that intended functionality? or perhaps a bug?

havron commented 3 years ago

Hi @21stcaveman

That weekend validity check runs here:

https://github.com/Netflix/lemur/blob/3783fbeaa1645bbee022827f4f53ffb12dd65a61/lemur/common/validators.py#L128

You can set LEMUR_ALLOW_WEEKEND_EXPIRATION to true in your config to allow it to pass. My understanding is that DigiCert has a default 1 year validity on certs. So when you tried creating one yesterday (July 9, 2021), Lemur noticed that the cert’s expiration date of July 9, 2022 is a Saturday and raised an error.

Hope this helps!

21stcaveman commented 3 years ago

Thank you @havron , definitely helped. However, trying to issue my first certificate, it now fails with : acme.messages.Error: malformed :: the 'url' field in external account binding differs from outer JWS

I have filled the acme_url, certificate, eab_kid and eab_hmac_key fields only, and have checked the store_account checkbox while creating the ACME authority. I assume as explained by @hosseinsh, Lemur is now trying to get the private key during the first cert generation, and it somehow fails to do so. Any pointers?

bobmshannon commented 3 years ago

However, trying to issue my first certificate, it now fails with : acme.messages.Error: malformed :: the 'url' field in external account binding differs from outer JWS

This should be fixed by https://github.com/Netflix/lemur/pull/3657 and made available once a release is cut.

I've been testing the ACME plugin as well using DigiCert. The only remaining issue I've ran into is that it seems the domain ownership validation check (HTTP-01, DNS-01) is skipped when issuing a certificate for a pre-validated domain but the plugin attempts to solve the challenge (unsuccessfully) anyway. I believe this is the same issue as https://github.com/Netflix/lemur/issues/3157.

21stcaveman commented 3 years ago

@bobmshannon this is perfect, thank you. I guess I need to upgrade from 0.10.0 to master for now to be able to test, since #3657 has been merged to master. Good to know about #3157 as well, look forward to see that fixed.

21stcaveman commented 3 years ago

ok, upgrade to master did resolve the issue.

I had to add AWS Route53 credentials, and I did it using the os.environ from config file! @hosseinsh , are there variables like what there is for DYN that I should use for a rout53 provider (ACME_DYN_USERNAME and ACME_DYN_PASSWORD)? Perhaps ACME_AWS_ACCESS_KEY_ID and ACME_AWS_SECRET_ACCESS_KEY?

Now, I am running into an issue which has to d o with approvals. As we have it, every cert request has to be approved before the cert is issued. Obviously, that might take some time. Lemur, kept creating cert requests while waiting for approval. Is there any way to check for a pending cert request before submitting a new one?

bobmshannon commented 3 years ago

ok, upgrade to master did resolve the issue.

Nice!

I had to add AWS Route53 credentials, and I did it using the os.environ from config file! @hosseinsh , are there variables like what there is for DYN that I should use for a rout53 provider (ACME_DYN_USERNAME and ACME_DYN_PASSWORD)? Perhaps ACME_AWS_ACCESS_KEY_ID and ACME_AWS_SECRET_ACCESS_KEY?

Just for my own understanding and testing purposes, you were able to get Lemur to issue a certificate using DigiCert's ACME API after upgrading to master and configuring valid Route53 credentials?

Now, I am running into an issue which has to d o with approvals. As we have it, every cert request has to be approved before the cert is issued. Obviously, that might take some time. Lemur, kept creating cert requests while waiting for approval. Is there any way to check for a pending cert request before submitting a new one?

Do you still run into this issue if automatic certificate approval is enabled on DigiCert's end?

21stcaveman commented 3 years ago

Just for my own understanding and testing purposes, you were able to get Lemur to issue a certificate using DigiCert's ACME API after upgrading to master and configuring valid Route53 credentials?

Yes and no. I was able to have it create the ACME challenge in Route53 (verified manually), and then create a certificate request (Basic EV) in Digicert using ACME protocol (just the order, not cert itself, since it needs approval). I still need to know if there is a better way to configure route53 creds (IAM key and secret), but it seems to work with the python environment variable.

Do you still run into this issue if automatic certificate approval is enabled on DigiCert's end?

Not sure. I am not allowed to turn that on, since we don't have a dev environment with digicert (always test in prod, right?!), and we strictly require approval for any certificate orders. I tried to create a cert, and Lemur went ahead to create 7 orders in 3 minutes!! :)) I had one of our approvers approve one of the 7 orders. However, it seems like Lemur removes the pending cert once the request times out (not approved yet), so it never ends up picking the cert up.

hosseinsh commented 3 years ago

Hamid, Lemur relies on Boto for any AWS operations and Boto expects AWS credentials as described here.

In Lemur, you would set up a DNS provider via UI, and set the AWS account etc. Lemur would learn about any DNS zones and auto-select that DNS provider accordingly.

Lemur is mostly designed around the notation of automated issuance, so we haven't considered the case of human approving an order. The side effect of this is that Lemur would attempt a few times to issue the certificate, and give up eventually, when it cannot resolve the certificate.

Btw, for my own understanding, If you are relying on pre-approved domains, is there any advantage to using ACME with DigiCert, compared to their CertCentral API? (though the approval step would still be a blocker).

With respect to skipping the challenge, do you know, if DigiCert returns Status Valid challenges? if that is the case, Lemur can now skip challenges which have a valid status.

Otherwise, we might need to add a new option to ACME that defines skipping challenge validation. Basically similar to https://github.com/Netflix/lemur/blob/9b470acb05fdc962fa5e6576244807e96d57a63d/lemur/plugins/lemur_acme/challenge_types.py#L235-L241

only that we would move right away to

 orderr = acme_client.finalize_order(orderr, deadline, fetch_alternative_chains=True)
21stcaveman commented 3 years ago

Lemur relies on Boto for any AWS operations and Boto expects AWS credentials as described here.

Understood. I did try setting environment variables in supervisord ini file, as well as /home/lemur/.aws/credentials file, and it did not work for me. As a workaround, setting environment variables via os.environ in lemur config file worked for me.

Btw, for my own understanding, If you are relying on pre-approved domains, is there any advantage to using ACME with DigiCert, compared to their CertCentral API? (though the approval step would still be a blocker).

I personally don't see any advantages, other than ACME being an open protocol. Using an open protocol would mean that we can change our CA later on with minimal effort, not being bound to a vendor's API and the plugin for it. As you said, even when using the CC API, approval presents an issue, and the request times out. I initially setup the Digicert plugin, and once I saw the timeout, I thought maybe ACME would wait for approval and/or validation, since the request goes to 'pending certificates' section.

With respect to skipping the challenge, do you know, if DigiCert returns Status Valid challenges? if that is the case, Lemur can now skip challenges which have a valid status.

I do not. However, I can open a support case and ask since we are a customer. Can you elaborate on 'status valid' so I can relay it to them?

21stcaveman commented 3 years ago

Just wondering, do we get 'order ID' or 'request ID' back via ACME API, after submitting a request? Because if we do, we can check the request status via this api, and see whether it is approved or not.

I have been using certbot for 7 years now, but am somewhat unfamiliar with inner workings of ACME protocol. Is it possible to have delayed verification/cert retrieval? ACME client creates validation measure (DNS, HTTP, etc), and submits a request. ACME Authority then validates, and issues the certificate. While doing this with certbot, the entire process is synchronous. Is there an async option for this (so we close the request process after submission, and periodically check order/request status, and try to import the cert after it has been approved/issued)?

Full disclosure: Digicert does list enabling automatic approvals as a prerequisite for an ACME implementation in their docs, which suggests delayed cert retrieval is not an option. I'm just wondering if it can be implemented with a mix of ACME and CC APIs.

hosseinsh commented 3 years ago

This reference might shed some light on the ACME flow. Basically, once you place an order, a challenge resource is being created. The challenge resource shall have a status field, which one can check to see if the domain validation has been completed. With DigiCert, they require pre-approval for domains, so I am not sure if they follow the ACME protocol correctly.

It would be easy to verify, by just setting a debugger stop here https://github.com/Netflix/lemur/blob/9b470acb05fdc962fa5e6576244807e96d57a63d/lemur/plugins/lemur_acme/challenge_types.py#L88

Nevertheless, it seems you have a dependency on the manual approval process. This might be something you need to solve for first, since Lemur is not designed with this step in mind. Otherwise, Lemur would not be able to auto-rotate certificates.

21stcaveman commented 3 years ago

@hosseinsh , just heard back from Digicert. Here is their response :

"Hello Hamid, Thank you for contacting DigiCert Support. We do use the ACME v2 challenge status. This is defined here"

Based on this, Lemur should be able to check the order status. Now, if this is implemented, is automatic approval required after that? I'm working to get authorization to setup automatic approval, but just out of curiosity, wouldn't Lemur create an ACME request, and then check the status of that challenge instead of creating a new request? and if so, Lemur would not create new orders while the last (for a given distinct subject) is unapproved. Correct? Or am I missing something?

hosseinsh commented 3 years ago

Hey Hamid, this is promising. This means for pre-validated domains Lemur's ACME plugin would just skip the validation step, and go right into finalizing the order.

Good to hear about the prospect of setting up automatic approval. This is essential in a world of short-lived credentials which need to get rotated in time.

Honestly, I am not sure how the approval process would come into play today. Lemur would likely see domain validation is still valid, and attempt to finalize the order, which will fail because of the pending approval. But you mentioned somewhere that ACME requires anyways automatic approval, which means you won't have this problem.

21stcaveman commented 3 years ago

I think where I get confused, is the order of things. Does approval happen before the ACME challenge? or after?

When trying to create an ACME cert with DNS challenge, Lemur will create the TXT record, and submit a challenge/order to DigiCert (or any other CA). Now, would the CA try and validate the DNS challenge before approval? or would they wait for approval first and then complete the challenge? I'm gonna have to ask them.

The difference would be, if they complete DNS challenge first, then status of the challenge will be valid, and Lemur wouldn't know about approval state. Where is, if approval comes first, then challenge status stays invalid until approved. Which in turn would mean, Lemur will wait for approval! no automatic approval necessary. Right?

hosseinsh commented 3 years ago

I believe DigiCert's ACME API require auto-approval to avoid such confusions. Also, since domains are pre-validated, by definition the challenge should come back as valid, which means Lemur will attempt to finalize the order and fail doing so, due to the missing approval.

21stcaveman commented 3 years ago

Also, since domains are pre-validated, by definition the challenge should come back as valid, which means Lemur will attempt to finalize the order and fail doing so, due to the missing approval.

Got it. Now, my final question is, since the domain is pre-validated, do I even need to set a DNS provider when creating a certificate? In my tests, if credentials are incorrect for route53 DNS, certificate creation fails. Where is, it shouldn't even come to that, correct?

P.S. Also, if I leave the DNS provider option on "Automatically select for me", cert creation fails with "No DNS providers found for domain: ..."

21stcaveman commented 3 years ago

Pulled the latest master, DNS validation issues still stand. I have to configure a valid DNS provider, and select it for ACME to work. (I have configured a dummy route53 DNS provider in our dev environment, and it works. Which means, actual DNS validation does not happen, but Lemur still requires a DNS provider to be selected, and to be able to create the TXT record before even submitting the order.)

I have also set the automatic approval up, so that's out of the way. Now, the certificate does get issued via ACME, it can not be imported however. Lemur fails with Exception("Unable to create certificate: {'chain': ['Invalid certificate in certificate chain.']}") I have downloaded the issued certificate, and noticed it is signed with a different CA than what I have configured for ACME authority in Lemur. Would that be the cause? Do I need to download and add all CAs from digicert to Lemur?

bobmshannon commented 3 years ago

I tested this further on my end as well.

It looks like the challenge status is not set when the new_order is created using the ACME client which might explain why #3666 didn't fix the issue.

{
        "authorizations": [
                "https://acme.digicert.com/v2/acme/authz/<<snipped>>"
        ],
        "expires": "2021-07-22T16:23:49-06:00",
        "finalize": "https://acme.digicert.com/v2/acme/finalize/<<snipped>>",
        "identifiers": [
                {
                        "type": "dns",
                        "value": "fakedomain.com"
                }
        ],
        "status": "pending"
}

It does seem that the challenge response changes to "valid" eventually (within a few seconds) though if the client keeps polling for authorizations:

{
        "challenges": [
                {
                        "status": "valid",
                        "token": "<<snipped>>",
                        "type": "dns-01",
                        "url": "https://acme.digicert.com/v2/acme/challenge/<<snipped>>",
                        "validated": "1970-01-01T00:00:00Z"
                },
                {
                        "status": "valid",
                        "token": "<<snipped>>",
                        "type": "http-01",
                        "url": "https://acme.digicert.com/v2/acme/challenge/<<snipped>>",
                        "validated": "1970-01-01T00:00:00Z"
                },
                {
                        "status": "valid",
                        "token": "<<snipped>>",
                        "type": "tls-alpn-01",
                        "url": "https://acme.digicert.com/v2/acme/challenge/<<snipped>>",
                        "validated": "1970-01-01T00:00:00Z"
                }
        ],
        "expires": "2021-07-30T17:24:39-06:00",
        "identifier": {
                "type": "dns",
                "value": "fakedomain.com"
        },
        "status": "valid",
        "wildcard": false
}

I believe this would also explain why it only works when a DNS provider is configured when using the DNS-based ACME plugin, because in this case Lemur will poll for authorizations again a short time later.

As another data point, I'm able to issue a certificate automatically using CertBot running in standalone mode so I think there is some nuance involved with the way this needs to be handled on the client side for it to work.