Closed JimMadge closed 1 month ago
Click to see where and how coverage changed
File Statements Missing Coverage Coverage
(new stmts)Lines missing
data_safe_haven/commands
sre.py
data_safe_haven/external/api
azure_sdk.py
439-443, 783-794
data_safe_haven/infrastructure/programs
declarative_sre.py
56-58
data_safe_haven/infrastructure/programs/sre
networking.py
48-50, 1841
Project Total
This report was generated by python-coverage-comment-action
I'm getting a failure to create the SSL cert.
+ pulumi-python:dynamic:Resource sre_data_kvc_https_certificate
**creating failed** error: Exception calling application: Failed to
create SSL certificate resrudh-kernow-develop-turingsafehaven-ac-uk for
resrudh.kernow.develop.turingsafehaven.ac.uk. Failed to create DNS TXT record _acme-challenge
in zone resrudh.kernow.develop.turingsafehaven.ac.uk.
Azure SDK says it failed to create the TXT record. Feels like an odd thing to crop up here as this is all within the SRE subscription. The DNS resource already has a set of records deployed with Pulumi.
Possibly related to #2209.
Any thoughts @jemrobinson.
Is this perhaps using the SHM provider instead of the (default) SRE one?
Failed to create DNS TXT record
message comes from AzureSdk::ensure_dns_txt_record
Failed to create SSL certificate
message comes from SSLCertificateProvider::create
which calls AzureSdk::ensure_dns_txt_record
the entire relevant section of code is:
client = ACMEClient(
domains=[props["domain_name"]],
email=props["admin_email_address"],
directory="https://acme-v02.api.letsencrypt.org/directory",
nameservers=["8.8.8.8", "1.1.1.1"],
new_account=True,
)
private_key_bytes = client.generate_private_key(key_type="rsa2048")
client.generate_csr()
azure_sdk = AzureSdk(props["subscription_name"], disable_logging=True)
verification_tokens = client.request_verification_tokens().items()
for record_name, record_values in verification_tokens:
record_set = azure_sdk.ensure_dns_txt_record(...)
which makes me feel like perhaps the ACMEClient
is not being correctly created and/or the generate_private_key()
or generate_csr()
functions aren't doing what we expect. Can you retry with some manual logging interventions (e.g. set disable_logging=False
in the AzureSdk call and also add some logger.info
lines to help diagnose?)
N.B. this uses the production let's encrypt server, so for debugging it's worth manually changing to the staging server.
Yeah that's what I've been looking at. I'll do a bit more digging.
I would be surprised if it were a wrong subscription problem as the SRE subscription gets passed to the dynamic provider as an argument, then the Azure SDK client is created using that argument.
I can query the DNS zone using AZ CLI so the permissions should be correct.
This could be the same SSL certificate problem as #2209, i.e. unrelated to these subscription changes.
On a positive note. Creating the NS records in the SHM DNS zone works :+1:.
Getting somewhere now,
**creating failed** error: Exception calling application: Failed to
create SSL certificate porthperan-kernow-develop-turingsafehaven-ac-uk for
porthperan.kernow.develop.turingsafehaven.ac.uk. Failed to create DNS TXT record
_acme-challenge in zone porthperan.kernow.develop.turingsafehaven.ac.uk.
(ResourceGroupNotFound) Resource group 'shm-kernow-sre-porthperan-rg' could not be found.
The AzureSDK class is using the wrong subscription 🤔.
Deployment succeeds at a0ac911. SRE provisioning manager throws a similar error.
:white_check_mark: Checklist
Enable foobar integration
rather than515 foobar
).develop
.:vertical_traffic_light: Depends on
:arrow_heading_up: Summary
:closed_umbrella: Related issues
Closes #2201
:microscope: Tests