Open smalenfant opened 2 years ago
Notices that this happened after a 6.1.0 upgrade from 5.1.2 and also after a Traffic Vault Riak -> Postgresql migration.
Does that mean after doing both of those things, or after doing either of them?
Seems related to #7128. Looks like there is issues with Traffic Vault - postgresql backend.
@smalenfant - can you add to the description which version(s) of TC this bug exists in.
@mitchell852 I added. I'm pretty this should also affect master/7.1.x as I don't see changes regarding the backend.
Just an update here. When I issue a renew, this will always happen on the first try (500 error on the UI). I just go back and renew again and it "renews successfully".
First "renew": 2023/01/23 14:01:45 [INFO] [*.ds.cdn1.coxlab.net] acme: Validations succeeded; requesting certificates Error posting acme certificate to Traffic Vault: could not begin Traffic Vault PostgreSQL transaction: context deadline exceeded: context deadline exceeded
Second "renew": 2023/01/23 14:03:42 [INFO] [*.ds.cdn1.coxlab.net] Server responded with a certificate.
I fixed/workaround this problem by increasing the timeouts in cdn.conf from 60 seconds to 120 seconds. The renew API is not asynchronous and prone to timeout depending how long it takes to complete the renewal.
I'm not exactly sure which one of the 8-10 timeout defined fixed it. Possibly the idle_timeout.
Notices that this happened after a 6.1.0 upgrade from 5.1.2 and also after a Traffic Vault Riak -> Postgresql migration.
This Bug Report affects these Traffic Control components:
Current behavior:
When requesting certificate renewal, the ACME/Let's Encrypt process goes through but fails to write the new certificate to the database.
/var/log/message:
Traffic Ops Log:
Expected behavior:
Certificate to be written to DB.
Steps to reproduce:
See above.