EnMasseProject / enmasse

EnMasse - Self-service messaging on Kubernetes and OpenShift
https://enmasseproject.github.io
Apache License 2.0
190 stars 87 forks source link

broker pod does not start due to issue "no valid keystore" #4358

Closed k-wall closed 4 years ago

k-wall commented 4 years ago

Describe the bug

On multinode cluster sometimes broker pod does not start due to issue "no valid keystore". The following is written to the broker-plugin init container.

Error certificate is not yet valid getting chain.
unable to write 'random state'
keytool error: java.lang.Exception: Source keystore file exists, but is empty: /tmp/enmasse-keystore.p12
Certificate was added to keystore
Certificate was added to keystore
unable to write 'random state'
Importing keystore /tmp/external-keystore.p12 to /opt/amq/custom/certs/external-keystore.jks...
Entry for alias io.enmasse successfully imported.
Import command completed: 1 entries successfully imported, 0 entries failed or cancelled
broker-plugin is complete

To Reproduce Steps to reproduce the behavior:

Not known

Expected behavior Broker should start Screenshots If applicable, add screenshots to help explain your problem.

Additional context Add any other context about the problem here.

k-wall commented 4 years ago

I noticed that the the broker pod did come up cleanly after a oc pod delete. My working theory is that this could be clock skew between the nodes resulting in the failing openssl command Error certificate is not yet valid getting chain.

As @lulf suggested, the broker init script ought to be using set -e. If this were done, and the skew was slight, the crash looping would clear the problem up without intervention. If the skew were larger, or the root cause different, at least we'd fail early.