EnMasseProject / enmasse

EnMasse - Self-service messaging on Kubernetes and OpenShift
https://enmasseproject.github.io
Apache License 2.0
190 stars 87 forks source link

Operator fails to renew certificates properly #5056

Open ctron opened 4 years ago

ctron commented 4 years ago

I have an issue, after running shared infrastructure for a while, that the operator seems to fail to renew the certificates:

{"level":"info","ts":1595571780.565553,"logger":"amqpcommand","msg":"Command Client amqps://10.130.0.39:5671 - restarting - backoff 5s(s), x509: certificate has expired or is not yet valid: current time 2020-07-24T06:23:00Z is after 2020-07-23T10:07:55Z"}
{"level":"info","ts":1595571785.5656567,"logger":"amqpcommand","msg":"Command Client - connecting amqps://10.130.0.39:5671"}
{"level":"info","ts":1595571785.5828426,"logger":"amqpcommand","msg":"Command Client amqps://10.130.0.39:5671 - restarting - backoff 5s(s), x509: certificate has expired or is not yet valid: current time 2020-07-24T06:23:05Z is after 2020-07-23T10:07:55Z"}
{"level":"info","ts":1595571790.5829506,"logger":"amqpcommand","msg":"Command Client - connecting amqps://10.130.0.39:5671"}
{"level":"info","ts":1595571790.6026292,"logger":"amqpcommand","msg":"Command Client amqps://10.130.0.39:5671 - restarting - backoff 5s(s), x509: certificate has expired or is not yet valid: current time 2020-07-24T06:23:10Z is after 2020-07-23T10:07:55Z"}
{"level":"info","ts":1595571795.6027315,"logger":"amqpcommand","msg":"Command Client - connecting amqps://10.130.0.39:5671"}
{"level":"info","ts":1595571795.620408,"logger":"amqpcommand","msg":"Command Client amqps://10.130.0.39:5671 - restarting - backoff 5s(s), x509: certificate has expired or is not yet valid: current time 2020-07-24T06:23:15Z is after 2020-07-23T10:07:55Z"}
{"level":"info","ts":1595571800.62056,"logger":"amqpcommand","msg":"Command Client - connecting amqps://10.130.0.39:5671"}
{"level":"info","ts":1595571800.6436908,"logger":"amqpcommand","msg":"Command Client amqps://10.130.0.39:5671 - restarting - backoff 5s(s), x509: certificate has expired or is not yet valid: current time 2020-07-24T06:23:20Z is after 2020-07-23T10:07:55Z"}
{"level":"info","ts":1595571805.6438673,"logger":"amqpcommand","msg":"Command Client - connecting amqps://10.130.0.39:5671"}
{"level":"info","ts":1595571805.6631546,"logger":"amqpcommand","msg":"Command Client amqps://10.130.0.39:5671 - restarting - backoff 5s(s), x509: certificate has expired or is not yet valid: current time 2020-07-24T06:23:25Z is after 2020-07-23T10:07:55Z"}
{"level":"info","ts":1595571810.6634145,"logger":"amqpcommand","msg":"Command Client - connecting amqps://10.130.0.39:5671"}
{"level":"info","ts":1595571810.6888678,"logger":"amqpcommand","msg":"Command Client amqps://10.130.0.39:5671 - restarting - backoff 5s(s), x509: certificate has expired or is not yet valid: current time 2020-07-24T06:23:30Z is after 2020-07-23T10:07:55Z"}
{

It seems that this prevents the operator to properly delete messaging endpoints as well.

ctron commented 4 years ago

It looks like as if this is caused by the fact that the CA is only renewed when the reconcile loop is run. However, that is not timer based by default.

I guess possible solutions to this could be:

ctron commented 4 years ago

I guess we have the same problem with the IoT bits, as they rely on the same pattern now.