Closed etedpet closed 3 years ago
A workaround for this issue is to disable endpoint verification:
cql:
tls:
require_endpoint_verification: false
The java driver log showed SSL errors(hostname verification failed) as each node could only connect to the local cassandra node. I suspect the issue lies with this, further supported by the fact that this cannot be reproduced without TLS or without require_endpoint_verification.
I could also not reproduce this issue on my local CCM with certificates enabled.
If you see this issue again without these SSL issues, please do not hesitate to reopen the ticket.
Running Cassandra in a Kubernetes cluster where the Cassandra process and ecChronos process run in the same Pod but in 2 different containers. When creating the cluster, one Pod (Cassandra node + ecChronos instance scheduling repair on that node) is started at a time. So the first ecChronos instence is started "together with" the first C* node.
When deploying with TLS enabled on the cql interface: (from security.yml)
The first ecChronos connected to the first C* node, will not work properly. For all keyspaces/tables created after ecChronos have been started (in practice all tables of interest):
But the second node that comes up (C* and ecChronos) works as expected!
The issue seems to be that the Cassandra driver for some reason does not recognize schema changes!? Logs below show how the driver reacts to a keyspace/table (
ks.tb1
) being created after the C* cluster is up (all nodes joined).Startup logs from the first (non working) ecChronos node:
Startup logs from the second (working) ecChronos node:
The second node will trigger the ecChronos
DefaultRepairConfigurationProvider#onTableAdded
callback, to handle the new table(s).This does not happen for the first node for some reason. And if ecChronos on the first node does not know of any "new tables" it cannot manage those tables.
The only way to work around this issue (on the first node) is to restart it (restart the container or pod in the k8s case)