Closed mattwestby closed 3 weeks ago
Thanks for this bug report @mattwestby. Can you add the supporting information about versions/OS etc. to the template above?
HI guys, any update on this one? THanks Matt
@mattwestby : I don't think we've been able to reproduce this (@craddm was going to look into deploying from an Azure Windows VM but I'm not sure how far he got with that). I might take a look at another way to upload the certificate but I haven't had time yet.
thanks @jemrobinson - is there a way i could manually create the cert just to get me past this sticking point for now?
thanks @jemrobinson - is there a way i could manually create the cert just to get me past this sticking point for now?
If you're able to run the following Python code on your deployment machine, inserting domain_name
and admin_email_address
as appropriate and adding a DNS TXT record when indicated, this should generate a certificate called <certificate name>.cert
which you can upload to the SRE keyvault as a certificate called <certificate name>
.
import time
from simple_acme_dns import ACMEClient
from cryptography.hazmat.primitives.asymmetric.rsa import RSAPrivateKey
from cryptography.hazmat.primitives.serialization import NoEncryption, load_pem_private_key, pkcs12
from cryptography.x509 import load_pem_x509_certificate
domain_name = "whatever your domain name is"
admin_email_address = "whatever email address you're using"
client = ACMEClient(
domains=domain_name,
email=admin_email_address,
directory="https://acme-v02.api.letsencrypt.org/directory",
nameservers=["8.8.8.8", "1.1.1.1"],
new_account=True,
)
# Generate private key and CSR
# Note that we must set the key to RSA-2048 before generating the CSR
# The default is ecdsa-with-SHA25, which Azure Key Vault cannot read
private_key_bytes = client.generate_private_key(key_type="rsa2048")
client.generate_csr()
verification_tokens = client.request_verification_tokens().items()
print("At this point you will need to manually a TXT record to the DNS zone for your SRE")
for record_name, record_values in verification_tokens:
print(f"record_name {record_name.replace(f'.{domain_name}', '')}; record_value {record_values[0]}")
# Wait for DNS propagation to complete
while not client.check_dns_propagation(authoritative=False, round_robin=True, verbose=False):
print("DNS propagation is ongoing")
time.sleep(30)
# Request a signed certificate
certificate_bytes = client.request_certificate()
private_key = load_pem_private_key(private_key_bytes, None)
if not isinstance(private_key, RSAPrivateKey):
msg = f"Private key is of type {type(private_key)} not RSAPrivateKey."
raise TypeError(msg)
all_certs = [
load_pem_x509_certificate(data)
for data in certificate_bytes.split(b"\n\n")
]
certificate = next(cert for cert in all_certs if domain_name in str(cert.subject))
ca_certs = [cert for cert in all_certs if cert != certificate]
certificate_secret_name = domain_name.replace(".", "-")
pfx_bytes = pkcs12.serialize_key_and_certificates(
certificate_secret_name.encode("utf-8"),
private_key,
certificate,
ca_certs,
NoEncryption(),
)
with open(f"{certificate_secret_name}.cert", "wb") as f_cert:
f_cert.write(pfx_bytes)
@mattwestby were you able to reproduce this in a new deployment? Can you see if the cert does exist or not?
My best guess is that your deployment has somehow ended up in a state where Pulumi believes the cert has been created (it is in the Pulumi stack, so when you run deploy
Pulumi will not try to create it) but the cert has not been put into storage.
If that is the case, I think the fix to your broken deployment is either,
and we may want to make changes to the code to make the cert generation more robust. However, I'm not certain there is a code change we would want to make which would fix your deployment as this feels like a rare occurrence which is mostly out of our control.
@mattwestby can reproduce this? We haven't been able to.
For future reference @JimMadge, my best guess as to why this happened is that the certificate was created on the deployment machine and that machine then tried to upload it to the Key Vault using its Managed Identity, rather than the appropriate Azure CLI credentials.
:white_check_mark: Checklist
:computer: System information
:package: Packages
List of packages
```none Paste list of packages here ```:no_entry_sign: Describe the problem
When attempting to deploy the SRE v5.0.0 the ssl certificate which is used by the application gateway doesnt get created in the SRE keyvault. When re-deploying the logs just say ERROR - Failed to retrieve certificate and doesnt try to recreate it.
:deciduous_tree: Log messages
Relevant log messages
```none ERROR - Failed to retrieve certificate testsre5-shm5-nottingham-ac-uk. ```:recycle: To reproduce