github / gh-ost

GitHub's Online Schema-migration Tool for MySQL
MIT License
12.43k stars 1.26k forks source link

gh-ost swap table is hanging cluster #974

Open shiwangini93 opened 3 years ago

shiwangini93 commented 3 years ago

gh-ost swap table caused writeset replication issues.

We have bi-directional (master -master) replication for mysql. I ran below statement to alter table:

     /usr/bin/gh-ost  --alter="add column status int null;"  --database=test  --table=noise  --user=test_user -- 
    password=xxxxxxxx  --chunk-size=2000 --max-load=Threads_connected=200 -exact-rowcount --verbose --execute 

It copied the data successfully to the switcher table. However, by the time of switchover it got stuck at this point:

   2021-05-24 07:15:34 INFO Found atomic RENAME to be blocking, as expected. Double checking the lock is still in 
   place (though I don't strictly have to)
   2021-05-24 07:15:34 INFO Checking session lock: gh-ost.1680148.lock
   2021-05-24 07:15:34 INFO Connection holding lock on original table still exists
   2021-05-24 07:15:34 INFO Will now proceed to drop magic table and unlock tables
   2021-05-24 07:15:34 INFO Dropping magic cut-over table

When, I checked for locking at database level. I didn't find any. However, when I ran show processlist; - I saw replication was getting stuck:

    | 1680148 | shiwangini  | 127.0.0.1:58328   | shiwangini_test    | Query       |   10613 | checking permissions                                          
    | drop /* gh-ost */ table if exists `shiwangini_test`.`_agents_del`                                    |         0 |             0 |
    | 1680150 | shiwangini  | 127.0.0.1:58332   | shiwangini_test    | Query       |   10614 | wsrep: initiating pre-commit for 
    write set (80151983)         | insert /* gh-ost */ into `shiwangini_test`.`_agents_ghc`
            (id, hint, value)
        values
            (NULLIF |         0 |             0 |
    | 1680151 | shiwangini  | 127.0.0.1:58334   | shiwangini_test    | Query       |   10614 | Waiting for table metadata lock                               
    | rename /* gh-ost */ table `shiwangini_test`.`agents` to `shiwangini_test`.`_agents_del`, `shiwangini |         0 |             
    0 |
    | 1680152 | shiwangini  | 127.0.0.1:58336   | shiwangini_test    | Query       |   10613 | wsrep: initiating pre-commit for 
    write set (80151991)         | insert /* gh-ost */ into `shiwangini_test`.`_agents_ghc`
            (id, hint, value)
        values
            (NULLIF |         0 |             0 |
    | 1680153 | shiwangini  | 127.0.0.1:58338   | shiwangini_test    | Query       |   10085 | wsrep: waiting to replay write set 
    (-1)                       | insert /* gh-ost */ into `shiwangini_test`.`_agents_ghc`
            (id, hint, value)
        values
            (NULLIF |         0 |             0 |
    | 1680156 | shiwangini  | 127.0.0.1:58340   | shiwangini_test    | Query       |   10610 | wsrep: initiating pre-commit for 
    write set (80152004)         | insert /* gh-ost */ into `shiwangini_test`.`_agents_ghc`
            (id, hint, value)
        values
            (NULLIF |         0 |             0 |
    | 1680158 | shiwangini  | 127.0.0.1:58342   | shiwangini_test    | Query       |   10605 | wsrep: initiating pre-commit for 
    write set (80152009)         | insert /* gh-ost */ into `shiwangini_test`.`_agents_ghc`
            (id, hint, value)
        values
            (NULLIF |         0 |             0 |
    | 1680159 | shiwangini  | 127.0.0.1:58344   | shiwangini_test    | Query       |   10600 | wsrep: initiating pre-commit for 
    write set (80152010)         | insert /* gh-ost */ into `shiwangini_test`.`_agents_ghc`
            (id, hint, value)
        values
            (NULLIF |         0 |             0 |
    | 1680161 | shiwangini  | 127.0.0.1:58348   | shiwangini_test    | Query       |   10595 | wsrep: initiating pre-commit for 
    write set (80152011)         | insert /* gh-ost */ into `shiwangini_test`.`_agents_ghc`
            (id, hint, value)
        values
            (NULLIF |         0 |             0 |
   | 1680162 | shiwangini  | 127.0.0.1:58350   | shiwangini_test    | Query       |   10590 | wsrep: initiating pre-commit for 
   write set (80152012)         | insert /* gh-ost */ into `shiwangini_test`.`_agents_ghc`
            (id, hint, value)
        values
            (NULLIF |         0 |             0 |
   | 1680165 | shiwangini  | 127.0.0.1:58352   | shiwangini_test    | Query       |   10585 | wsrep: initiating pre-commit for write 
   set (80152013)         | insert /* gh-ost */ into `shiwangini_test`.`_agents_ghc`
            (id, hint, value)
        values
            (NULLIF |         0 |             0 |
  | 1680166 | shiwangini  | 127.0.0.1:58354   | shiwangini_test    | Query       |   10580 | wsrep: initiating pre-commit for write 
  set (80152014)         | insert /* gh-ost */ into `shiwangini_test`.`_agents_ghc`
            (id, hint, value)
        values
            (NULLIF |         0 |             0 |
   | 1680167 | shiwangini  | 127.0.0.1:58358   | shiwangini_test    | Query       |   10575 | wsrep: initiating pre-commit for write 
   set (80152015)         | insert /* gh-ost */ into `shiwangini_test`.`_agents_ghc`
            (id, hint, value)
        values
            (NULLIF |         0 |             0 |
    | 1680168 | shiwangini  | 127.0.0.1:58360   | shiwangini_test    | Query       |   10570 | wsrep: initiating pre-commit for 
    write set (80152016)         | insert /* gh-ost */ into `shiwangini_test`.`_agents_ghc`
            (id, hint, value)
        values
            (NULLIF |         0 |             0 |
    | 1680170 | shiwangini  | 127.0.0.1:58362   | shiwangini_test    | Query       |   10565 | wsrep: initiating pre-commit for 
    write set (80152017)         | insert /* gh-ost */ into `shiwangini_test`.`_agents_ghc`
            (id, hint, value)
        values
            (NULLIF |         0 |             0 |
     | 1680171 | shiwangini  | 127.0.0.1:58364   | shiwangini_test    | Query       |   10560 | wsrep: initiating pre-commit for 
     write set (80152018)         | insert /* gh-ost */ into `shiwangini_test`.`_agents_ghc`
            (id, hint, value)
        values
            (NULLIF |         0 |             0 |
      | 1680174 | shiwangini  | 127.0.0.1:58368   | shiwangini_test    | Query       |   10535 | wsrep: initiating pre-commit for 
      write set (80152019)         | insert /* gh-ost */ into `shiwangini_test`.`_agents_ghc`
            (id, hint, value)
        values
            (NULLIF |         0 |             0 |

I cancelled the gh-ost execution and even after cancelling it - the cluster was keep hanging. Finally, we had to restart the whole node.

shlomi-noach commented 3 years ago

We have bi-directional (master -master) replication for mysql

Based on wsrep message in SHOW PROCESSLIST you do not have a bi-directional replication, you have synchronous InnoDN Cluster replication. It is not a supported architecture.

shiwangini93 commented 3 years ago

Thanks for the clarification. If possible could you please share reference doc where it's mentioned (which cluster it support). I referred this doc: https://github.com/github/gh-ost/blob/master/doc/requirements-and-limitations.md

shlomi-noach commented 3 years ago

I'm not sure how come this is not referenced in that document in the first place. My memory may not serve me right. did someone add some support for innodb cluster and we removed the limitation from the doc? @timvaillancourt maybe you know?

shlomi-noach commented 3 years ago

Please see https://github.com/github/gh-ost/issues/224#issuecomment-410521022 and https://github.com/github/gh-ost/pull/780/

I don't have any insight into either. I'm not experienced with innodb cluster / galera.

shiwangini93 commented 3 years ago

Thanks for this reference.