Closed zerosoul13 closed 3 years ago
The biggest one is 80 servers split in two datacenters (40 servers each with cross dc replication).
We regularly have this message Schema version mismatch detected
, even in non-biggraphite cluster I think. We have found that it doesn't really have any impacts, do you see one?
Maybe you can have more data by running nodetool describecluster
In our case, I recall not being to do much with BigGraphite data until Cassandra rolling restart was done. Digging through chat history, I found the following error message:
biggraphite.drivers._utils.Error: Error from server: code=1100 [Coordinator node timed out waiting for replica nodes' responses] message="Operation timed out - received only 0 responses." info={'consistency': 'ONE', 'required_responses': 1, 'received_responses': 0, 'write_type': 'SIMPLE'}
Do you have a replication factor of at least 2 ?
I've checked with our DBA team and we use RF=2
Hello team,
Our DBA team has found a couple of times an issue where there's a Schema version mismatch detected. When this happens, the only option we have is to do a rolling restart of the cluster.
I've seen mentions of you guys running a BigGraphite cluster with a good amount of Cassandra nodes (our setup has 16 nodes) so wanted to ask if you have come across this issue before. The reason why there's not much more details about this is because the issue has only happened a couple of times and haven't been able to do full event correlation to clearly point out any other possible issues.
Any comments are appreciated