Open Hexcles opened 8 months ago
Peering into https://github.com/github/gh-ost/pull/1201, I found it perplexing that it says this in the docs:
This is not a problem for most scenarios, but it could be a problem for users that start the DDL during a period with long running transactions.
This is really critical information, in my opinion, because if gh-ost is supposed to be inherently safe, it seems to jeopardize this safety by potentially creating table outages with no controllable timeout here. We primarily introduced gh-ost because of long-running transactions that were hard to pin down and a lack of safety with LHM in these scenarios. While we've mostly cleaned these up, I still think anything that could remotely incur a table outage should have defined characteristics for how long the table will be out.
Currently gh-ost only sets
lock_wait_timeout
when doing normal cutover: https://github.com/search?q=repo%3Agithub%2Fgh-ost%20lock_wait_timeout&type=codeWhen Instant DDL is used, there doesn't seem to be a way to set the lock timeout. We can either reuse the same
cut-over-lock-timeout-seconds
flag or introduce a new one specifically for Instant DDL.