github / gh-ost

GitHub's Online Schema-migration Tool for MySQL
MIT License
12.31k stars 1.25k forks source link

cut-over locks not released when gh-ost pauses mid-cut-over #1407

Closed timvaillancourt closed 1 month ago

timvaillancourt commented 5 months ago

During the cut-over operation gh-ost issues a lock tables on the tables before they're renamed. After the rename an unlock tables is issued to unlock the tables

Today, if gh-ost pauses/freezes (process remains running but is unresponsive due to a host problem) between the lock tables and unlock tables, the locks are not released. We haven't explained what could cause the host running gh-ost to essentially freeze execution, but we had this occur in production and locks were never released until the MySQL wait_timeout (for killing idle connections)

This theoretically can be reproduced by:

  1. Adding a pause after the lock tables step in the cut-over (hand-wavy)
  2. Freeze the gh-ost process with kill -TSTP [pid] or kill -STOP [pid]
  3. Observe the table locks never getting released until wait_timeout (default 30 minutes)

To address this, I plan to shorten the wait_timeout of the applier MySQL session during cut-over only, as this is the only time where a short idle timeout is advantageous. After the cut-over the wait_timeout for the session will be restored to the server default

This work began in #1401 and is completed with #1406