codership / galera

Synchronous multi-master replication library
GNU General Public License v2.0
447 stars 177 forks source link

wsrep_sync_wait has no effect after COM_CHANGE_USER #613

Open renecannao opened 2 years ago

renecannao commented 2 years ago

Description

wsrep_sync_wait controls "strict cluster-wide causality checks" , and it is a session variable. It seems that when COM_CHANGE_USER is executed, the value of wsrep_sync_wait has no effect whatsoever, no matter if:

How to reproduce

Script details

The script performs the following:

  1. it opens 1 connection to each Galera node
  2. it uses node1 for writes, and node2 and nodes3 for reads
  3. create a test table with just one columns
  4. in a loop of 200 cycles: a. delete all rows from the test table b. insert rows on the test table on node1 c. count rows on node2 and node3
  5. execute COM_CHANGE_USER on the connections to node2 and node3
  6. repeats point 4
  7. explicitly set wsrep_sync_wait=0 and then wsrep_sync_wait=7 to both node2 and node3 (setting 0 is not needed, we are just trying to change it twice in case make any different, but it doesn't)
  8. repeats point 4

Expected behavior

On points 4.c , 6.c and 8.c: the number of rows read should always be equal to the number of rows inserted in point 4.b , 6.b and 8.b

Actual behavior

On point 4.c: the number of rows read are always equal to the number of rows inserted in point 6.b and 8.b On points 6.c and 8.c: the number of rows read are not always equal to the number of rows inserted in point 6.b and 8.b

Example output:

$ php -f reproduce3.php | sort | uniq -c | sort -k 2,3
    200 LOOP1: node2  count: 20 - sum: 210 - wsrep_sync_wait: 7
    200 LOOP1: node3  count: 20 - sum: 210 - wsrep_sync_wait: 7
      1 LOOP2: node2  count: 10 - sum: 45 - wsrep_sync_wait: 7
      1 LOOP2: node2  count: 14 - sum: 91 - wsrep_sync_wait: 7
      1 LOOP2: node2  count: 15 - sum: 105 - wsrep_sync_wait: 7
      1 LOOP2: node2  count: 16 - sum: 120 - wsrep_sync_wait: 7
      1 LOOP2: node2  count: 19 - sum: 171 - wsrep_sync_wait: 7
      1 LOOP2: node2  count: 4 - sum: 6 - wsrep_sync_wait: 7
      2 LOOP2: node2  count: 17 - sum: 136 - wsrep_sync_wait: 7
      2 LOOP2: node2  count: 18 - sum: 153 - wsrep_sync_wait: 7
     67 LOOP2: node2  count: 20 - sum: 190 - wsrep_sync_wait: 7
    123 LOOP2: node2  count: 20 - sum: 210 - wsrep_sync_wait: 7
      1 LOOP2: node3  count: 0 - sum:  - wsrep_sync_wait: 7
      1 LOOP2: node3  count: 1 - sum: 0 - wsrep_sync_wait: 7
      1 LOOP2: node3  count: 18 - sum: 153 - wsrep_sync_wait: 7
      1 LOOP2: node3  count: 19 - sum: 171 - wsrep_sync_wait: 7
      1 LOOP2: node3  count: 7 - sum: 21 - wsrep_sync_wait: 7
      1 LOOP2: node3  count: 8 - sum: 28 - wsrep_sync_wait: 7
      2 LOOP2: node3  count: 15 - sum: 105 - wsrep_sync_wait: 7
      2 LOOP2: node3  count: 17 - sum: 136 - wsrep_sync_wait: 7
     71 LOOP2: node3  count: 20 - sum: 190 - wsrep_sync_wait: 7
    119 LOOP2: node3  count: 20 - sum: 210 - wsrep_sync_wait: 7
      1 LOOP3: node2  count: 11 - sum: 55 - wsrep_sync_wait: 7
      1 LOOP3: node2  count: 12 - sum: 66 - wsrep_sync_wait: 7
      1 LOOP3: node2  count: 14 - sum: 91 - wsrep_sync_wait: 7
      1 LOOP3: node2  count: 15 - sum: 105 - wsrep_sync_wait: 7
      1 LOOP3: node2  count: 17 - sum: 136 - wsrep_sync_wait: 7
      1 LOOP3: node2  count: 19 - sum: 171 - wsrep_sync_wait: 7
      2 LOOP3: node2  count: 16 - sum: 120 - wsrep_sync_wait: 7
     57 LOOP3: node2  count: 20 - sum: 190 - wsrep_sync_wait: 7
    135 LOOP3: node2  count: 20 - sum: 210 - wsrep_sync_wait: 7
      1 LOOP3: node3  count: 11 - sum: 55 - wsrep_sync_wait: 7
      1 LOOP3: node3  count: 13 - sum: 78 - wsrep_sync_wait: 7
      1 LOOP3: node3  count: 14 - sum: 91 - wsrep_sync_wait: 7
      1 LOOP3: node3  count: 15 - sum: 105 - wsrep_sync_wait: 7
      1 LOOP3: node3  count: 16 - sum: 120 - wsrep_sync_wait: 7
      1 LOOP3: node3  count: 9 - sum: 36 - wsrep_sync_wait: 7
      2 LOOP3: node3  count: 17 - sum: 136 - wsrep_sync_wait: 7
      3 LOOP3: node3  count: 19 - sum: 171 - wsrep_sync_wait: 7
     64 LOOP3: node3  count: 20 - sum: 190 - wsrep_sync_wait: 7
    125 LOOP3: node3  count: 20 - sum: 210 - wsrep_sync_wait: 7

The test was reproduce on various version of Galera, including 10.6.5-MariaDB-log and the official 8.0.26-26.8 (from Codership). wsrep_sync_wait---com_change_user.php.txt

tc-hsteffen commented 2 years ago

we are experiencing this issue, It would be great if it could be fixed soon. Thank you!