Open PrzemekMalkowski opened 9 years ago
i50320
Ok, what seems to be happening here is that the joiner can't survive too frequent configuration changes. "Too frequent" meaning that it disconnects and reconnects to group before being able to flush the slave queue. In this case JOINING state is cleared by (multiple) non-primary views which ruin the protective heuristics (effectively the node forgets that it was receiving state transfer)
How "too frequent" is in this case? Note the first PC in the joiner log:
150127 0:18:09 [Note] WSREP: Quorum results:
version = 2,
component = PRIMARY,
conf_id = 61,
members = 2/3 (joined/total),
act_id = 557850488,
last_appl. = 557847332,
protocols = 0/4/2 (gcs/repl/appl),
group UUID = b5a8d6fc-dbf5-11e3-ac98-a73ea7e811fe
This is bottom layer, before slave queue. Notice act_id value. Only 48 seconds later the node processes PC at the top layer:
150127 0:18:57 [Note] WSREP: State transfer required:
Group state: b5a8d6fc-dbf5-11e3-ac98-a73ea7e811fe:557849318
Local state: b5a8d6fc-dbf5-11e3-ac98-a73ea7e811fe:557847539
Notice group state seqno. It is lower than the act_id value above. That means that it is not even this PC event being processed But some earlier one, which is not even in the log. Apparently it happened so way back that was not even considered to be related.
And then another disconnect from PC and another reconnect:
150127 0:18:58 [Note] WSREP: Quorum results:
version = 2,
component = PRIMARY,
conf_id = 63,
members = 2/3 (joined/total),
act_id = 557852431,
last_appl. = 557847332,
protocols = 0/4/2 (gcs/repl/appl),
group UUID = b5a8d6fc-dbf5-11e3-ac98-a73ea7e811fe
And only after that the state request is sent:
150127 0:19:15 [Note] WSREP: Prepared IST receiver, listening at: tcp://10.16.72.134:4568
150127 0:19:15 [Note] WSREP: Node 0 (node01_dpadb1) requested state transfer from '*any*'. Selected 2 (node02_dpadb2)(SYNCED) as donor.
150127 0:19:15 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 557853512)
150127 0:19:15 [Note] WSREP: Requesting state transfer: success, donor: 2
And the the first PC in the log is being processed only at
150127 0:19:16 [ERROR] WSREP: Local state seqno (557852431) is greater than group seqno (557850488): states diverged. Aborting to avoid potential data loss. Remove '/data/mysql/data//grastate.dat' file and restart if you wish to continue. (FATAL)
at galera/src/replicator_str.cpp:state_transfer_required():33
This is clearly a bug. But in this case the node was clearly overloaded and probably suboptimally configured:
150127 0:18:18 [Warning] WSREP: last inactive check more than PT1.5S ago (PT5.35902S), skipping check
is a clear sign of that. One thing that can be done about it - increasing gcomm timeouts to prevent excessive cluster partitionings.
One node suffered network issues and disconnected and reconnected the cluster several times. After it finally managed to get IST transfer from other node, it failed with:
The version where it happened is PXC 5.5.37-rel35.0-25.10.756, Galera 2.10(r175). Three cluster nodes - 2 + 1 arbitrator.
JOINER err log:
(I skipped many lines but tried to paste as many as necessary to get some idea about network issues)
DONOR err log:
GARBD log: