codership / mysql-wsrep

wsrep API patch for MySQL server
Other
65 stars 34 forks source link

wsrep_cluster_conf_id does not show a correct value (negative number, longint overflow!) #401

Closed shinguz closed 2 months ago

shinguz commented 2 years ago

8.0.27-26.9

mysql> show global status like 'wsrep_cluster_conf_id'; +-----------------------+----------------------+ | Variable_name | Value | +-----------------------+----------------------+ | wsrep_cluster_conf_id | 18446744073709551615 | +-----------------------+----------------------+

From the MySQL Error Log we see the correct values: conf_id = 8,

This is sad/critical because cluster_conf_id is the only reliable source where we can see nodes bouncing here and there...

shinguz commented 2 years ago

This looks like -1 or an int underflow

wsrep-lib/wsrep-API/v26/wsrep_api.h:#define WSREP_SEQNO_UNDEFINED (-1)

When we search in the code, somwhere here must be a/the bug:

sql/wsrep_sst.cc: wsrep_seqno_t ret_wsrep_seqno = WSREP_SEQNO_UNDEFINED; sql/wsrep_sst.cc: wsrep_seqno_t ret_local_wsrep_seqno = WSREP_SEQNO_UNDEFINED; sql/wsrep_sst.cc: wsrep_seqno_t ret_seqno= WSREP_SEQNO_UNDEFINED; // seqno of complete SST sql/wsrep_mysqld.cc:long long wsrep_cluster_conf_id = WSREP_SEQNO_UNDEFINED; sql/wsrep_mysqld.cc:wsrep_seqno_t local_seqno = WSREP_SEQNO_UNDEFINED; wsrep-lib/wsrep-API/v26/wsrep_api.h: undefined GTID: WSREP_UUID_UNDEFINED:WSREP_SEQNO_UNDEFINED. wsrep-lib/wsrep-API/v26/examples/node/store.c: struct record const record = { WSREP_SEQNO_UNDEFINED, i }; wsrep-lib/wsrep-API/v26/examples/node/store.c: bool const initialization = WSREP_SEQNO_UNDEFINED == store->gtid.seqno && wsrep-lib/wsrep-API/v26/examples/node/wsrep.c: .state_id = {{{ 0, }}, WSREP_SEQNO_UNDEFINED }, wsrep-lib/wsrep-API/v26/examples/listener.c: wsrep_gtid_t state_id = { WSREP_UUID_UNDEFINED, WSREP_SEQNO_UNDEFINED }; wsrep-lib/include/wsrep/provider.hpp: or WSREP_SEQNO_UNDEFINED if the victim was not ordered

And it happens already after the bootstrap before the first join...

shinguz commented 2 years ago
    conf_id    = 0,
    conf_id    = 1,

And I think earlier it started with 1 and not 0!

shinguz commented 2 years ago

Interestingly we found out in this weeks Galera training it does NOT happen always but just sometimes. On by Ubuntu 18.04 with MariaDB 10.6 Galera Cluster and on MySQL 8.0 Galera Cluster I cannot see it right now. Also on the Oracle Linux 8 yesterday with MariaDB 10.6 Galera Cluster we did not see it. But I have seen it already many times in different set-ups. So I have to think about a reproducible test case... Please let me know if you know what is wrong so I do not waste my time.

sciascid commented 2 months ago

A fix will be available with the next release