czardoz / openarkkit

Automatically exported from code.google.com/p/openarkkit
0 stars 1 forks source link

oak-online-alter-table patch to recover from innodb locks #19

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
Quoting mail sent to Shlomi by Lachlan M.:

We have some very large tables that we needed to upgrade the
AUTO_INCREMENT id field from UNSIGNED INT to UNSIGNED BIGINT -- during
the process of trying to perform this upgrade we hit some frustrating
issues...

If the online alter tool encountered an InnoDB deadlock or
lock_wait_timeout situation, it would treat the error as a "hard
error" and fail the whole table rebuild, rather than retry some
predetermined number of times...

This was painful for us, since we were in an emergency situation where
we got caught out and had to upgrade the table ASAP in order to get
service working for customers again. The tables were very large, so
some of them actually failed after 10-12 hours...

I'd guess this could be a problem for anyone really -- trying to do
such things with large somewhat active tables and failing after many
hours will be an issue anyone would want to avoid.

My patch adds a new parameter called --max-lock-retries (-r) which
defaults to 10. It will catch and retry any query that passes through
the act_query() function that fails as a result of a DEADLOCK or LOCK
WAIT TIMEOUT.

In addition, I made a secondary optional parameter to the act_query
function that lets you specify whether you want it to retry forever
(rather than up to the max-lock-retries). Then I moved the
lock_tables_write() function to simply specify that parameter and
removed the exception catching and retry logic from that function.

I've attached my patched version of oak-online-alter-table script,
which was based on the latest script from the openark-kit-180-1 RPM.

Original issue reported on code.google.com by shlomi.n...@gmail.com on 21 Sep 2011 at 8:09

Attachments:

GoogleCodeExporter commented 8 years ago
Attached is the patch diff

Original comment by shlomi.n...@gmail.com on 21 Sep 2011 at 8:17

Attachments:

GoogleCodeExporter commented 8 years ago
Note - There is a very small bug in the patch.

    parser.add_option("-r", "--max-lock-retries", type="int", dest="max_lock_retries", default="10", help="Maximum times to retry on deadlock or lock_wait_timeout. (default: 50)")

The help text shows the default is 50, however the default is really 10. The 
line should read:

    parser.add_option("-r", "--max-lock-retries", type="int", dest="max_lock_retries", default="10", help="Maximum times to retry on deadlock or lock_wait_timeout. (default: 10)")

Original comment by lachlan....@gmail.com on 17 Oct 2011 at 5:55

GoogleCodeExporter commented 8 years ago
This issue was closed by revision r192.

Original comment by shlomi.n...@gmail.com on 13 Jan 2013 at 11:18