Closed timvaillancourt closed 3 months ago
@shlomi-noach I'm curious about your feedback on one consequence of this change
Following this PR, it is possible for the session holding the lock tables
to timeout (and unlock tables) before the "magic table" is dropped here
If understand this right, the lock-session hitting wait_timeout
will cause the rename tables
to succeed. That all sounds better than before, but the "magic table" will be left behind. The impact of this I don't fully understand
-initially-drop-old-table
Drop a possibly existing OLD table (remains from a previous run?) before beginning operation. Default is to panic and abort if such table exists
Would gh-ost
just-fix this scenario for users with -initially-drop-old-table
? Any other race-condition risks you can see the lock-release causing 🤔? 🙇
If understand this right, the lock-session hitting wait_timeout will cause the rename tables to succeed.
No, actually. The RENAME
will not succeed, because the magic table is still in place. The RENAME
statement attempts to rename original-table into magic-table. But since magic-table is there, the RENAME
will fail.
The next cut-over attempt will first, before placing any locks, attempt to DropAtomicCutOverSentryTableIfExists()
before re-creating it.
This should be safe.
If understand this right, the lock-session hitting wait_timeout will cause the rename tables to succeed.
No, actually. The
RENAME
will not succeed, because the magic table is still in place. TheRENAME
statement attempts to rename original-table into magic-table. But since magic-table is there, theRENAME
will fail.The next cut-over attempt will first, before placing any locks, attempt to
DropAtomicCutOverSentryTableIfExists()
before re-creating it.This should be safe.
@shlomi-noach that makes sense (eventually)! Thanks for the validations and explanations
@shlomi-noach / @meiji163: I believe I've addressed the PR suggestions and this is ready for another review 🙇
Merging. @shlomi-noach let me know if I missed something and I'll make a follow-up PR 👍
Related issue: #1407
Description
This PR refines https://github.com/github/gh-ost/pull/1401 by overriding the session
wait_timeout
only where it is needed - at the cut-over time where an idle connection could lead to potentially-long table lock if thegh-ost
process (or host running it) "freezes"/"stalls" at the cut-over stageThe change (at cut-over only):
wait_timeout
is fetched _(via an existingselect
that fetchedtime_zone
)_wait_timeout
is set to be 3 x the lock-wait timeoutgh-ost
stalls with a still-active connection herewait_timeout
is restored to what it was set to pre-cut-overThe
--mysql-wait-timeout
flag added in #1401 is removed because it is no longer needed. No release has been cut since #1401, so this isn't necessarily a breaking changescript/cibuild
returns with no formatting errors, build errors or unit test errors.