azukaar / Cosmos-Server

☁️ The Most Secure and Easy Selfhosted Home Server. Take control of your data and privacy without sacrificing security and stability (Authentication, anti-DDOS, anti-bot)
https://cosmos-cloud.io
Other
2.96k stars 105 forks source link

[BUG]: Unable to successfully receive Let's Encrypt certificate using Namesilo registrar #245

Closed BCITMike closed 2 months ago

BCITMike commented 2 months ago

What happened?

Namesilo has a known limitation that it may take up to 15 minutes for their nameservers to be updated. I have increased the "NAMESILO_POLLING_INTERVAL" to "15m" (up from previous failed attempts) and "NAMESILO_PROPAGATION_TIMEOUT" to "45m". After clicking "Force HTTPS Certificate Renewal on Next Save", I can refresh Namesilo domain page and see the acme_challenge TXT record. I set a timer for 9 minutes and then I check Namesilo and Cosmo (so around 10 minutes, give or take, after clicking "Save") and it reports that the nameserver for Namesilo reported NXDOMAIN and the acme_challenge TXT record is removed from Namesilo DNS page.

So either it is not using the values I entered, or it doesn't do any syntax checking on the values. In the Description field for "NAMESILO_PROPAGATION_TIMEOUT", it says "Maximum waiting time for DNS propagation, it is better to set larger than 15m" implies that the units allowed include "m" and not just seconds.

What should have happened?

The TXT record should not have been checked and removed before the first "NAMESILO_POLLING_INTERVAL" OR There should have been an error that "15m" is not a valid unit and that seconds are expected.

How to reproduce the bug?

  1. Go to 'Configuration'

Actually, as I started to enter this bug report, I decided to try entering the values using seconds, so 900 and 2700 for the two parameters. After about 10 minutes I went to check the Cosmos GUI and now get "ERR_CONNECTION_REFUSED" but it still responds to pings. Refreshing and closing browser and reopening hasn't allowed me to access the GUI again.

Relevant log output

Unfortunately, lost access to webGUI before I could save. I don't know the exact time, but its been over 15 minutes now (~18 minutes) and the _acme-challenge TXT record is still there and worldwide propagation is like 90%. It looks like it didn't check it after 15 minutes and remove the TXT record.

If it hasn't come back on its own by the morning, I'll reboot the server.

Other details

Based on it successfully adding the TXT record and not complaining about "15m" and "45m" as the variable values, I wouldn't have expected putting in the equivalent values in seconds to take down the Cosmos webGUI altogether. If anything, I'd understand the opposite.

System details

azukaar commented 2 months ago

This is an issue with LEGO (The Let's Encrypt Client), not with Cosmos itself, so I will close the ticket as there's nothing I can do about it

BCITMike commented 2 months ago

FYI, it worked without wildcard. But then the webGUI for cosmos still wasn't available half an hour later with "Too many requests".

While this is a LEGO issue, it should matter if you end up using third party software that is buggy and leaves your product unusable.

azukaar commented 2 months ago

Those are issues of the underlying Let's Encrypt ACME's protocol which there is no alternative to date. And for you it is particularly bad because of your registrar Namesilo which would be slow with any third party software (other provider sometime have his issue but moving the time out to like 30-60sec max usually does the trick for slow providers).

It is something that I'd love to improve moving forward with having a Cosmos specific domain provider that instantly provide HTTPS certificates, but of course that requires a bit more resources (because it's an online service) that I do not currently have. That is why it is important for Cosmos to become a sustainable product

PS: The non-wildcard challenge works fast because it does not change any DNS entries