czardoz / openarkkit

Automatically exported from code.google.com/p/openarkkit
0 stars 1 forks source link

Possible race condition in oak-online-alter-table #9

Open GoogleCodeExporter opened 8 years ago

GoogleCodeExporter commented 8 years ago
I created a simple table with an autoincrement key, and wrote a procedure to 
insert into it in a loop.  I started this procedure, and then ran 
oak-online-alter-table several times.  It succeeded several times, but then 
sometimes my stored procedure died with the following error, which makes it 
look like there is a race condition in the process of altering the table:

5141> call doinsert(50000);
ERROR 1146 (42S02): Table 'test.__oak_b' doesn't exist

Here is the full output of the tool:

$ python oak-online-alter-table.py --host=127.0.0.1 --database=test --table=b 
--alter="engine=innodb" --user=root --port=5141
-- Connecting to MySQL
-- Table test.b is of engine innodb
-- Checking for UNIQUE columns on test.b, by which to chunk
-- Possible UNIQUE KEY column names in test.b:
-- - a
-- Table test.__oak_b has been created
-- Table test.__oak_b has been altered
-- Checking for UNIQUE columns on test.__oak_b, by which to chunk
-- Possible UNIQUE KEY column names in test.__oak_b:
-- - a
-- Checking for UNIQUE columns on test.b, by which to chunk
-- - Found following possible unique keys:
-- - a (int)
-- Chosen unique key is 'a'
-- Shared columns: a
-- Created AD trigger
-- Created AU trigger
-- Created AI trigger
-- Attempting to lock tables

-- Tables locked WRITE
-- a (min, max) values: ([1L], [2416L])
-- Tables unlocked
-- Copying range (1), (1000), progress: 0%
-- Copying range (1000), (2000), progress: 41%
-- Copying range (2000), (2416), progress: 82%
-- Copying range 100% complete. Number of rows: 2416
-- Deleting range (1), (1000), progress: 0%
-- Deleting range (1000), (2000), progress: 41%
-- Deleting range (2000), (2416), progress: 82%
-- Deleting range 100% complete. Number of rows: 0
-- Table test.b has been renamed to test.__arc_b,
-- and table test.__oak_b has been renamed to test.b
-- Table test.__arc_b was found and dropped
-- ALTER TABLE completed

Do you know what is happening, and can it be solved?

Original issue reported on code.google.com by baron.schwartz on 22 Feb 2011 at 1:22

GoogleCodeExporter commented 8 years ago
My initial thoughts are that this is caused by the triggers still working while 
the table names alter. Table alteration *should* auto-commit any working 
trigger.
Additional input I would like to have:
- MySQL version
- on what INSERT value (if possible to tell) the procedure failed, and whether 
that value is to be found in the new "b" table.

I'll try repeating the experiment.

Original comment by shlomi.n...@gmail.com on 22 Feb 2011 at 6:23

GoogleCodeExporter commented 8 years ago
test case:

USE test;

DELIMITER ;

DROP TABLE IF EXISTS a;
CREATE TABLE a (
id INT UNSIGNED PRIMARY KEY
);

DELIMITER //

CREATE PROCEDURE doinsert(count INT)
BEGIN
  DECLARE counter INT UNSIGNED DEFAULT 0;
  WHILE counter < count DO
    INSERT INTO a (id) VALUES (counter);
    SET counter = counter+1;
  END WHILE;
END //

DELIMITER ;

Original comment by shlomi.n...@gmail.com on 22 Feb 2011 at 7:55

GoogleCodeExporter commented 8 years ago
Have managed to get same results using MySQL 5.1.51, table + SP as described 
above.

Original comment by shlomi.n...@gmail.com on 22 Feb 2011 at 2:10

GoogleCodeExporter commented 8 years ago
This becomes weirder.
I've added DELAYED logging into some MyISAM "log" table on each trigger 
invocation.
Apparently the stored procedure many times (> 50%) fails on:
ERROR 1146 (42S02): Table 'test.log' doesn't exist
even though I do nothing to rename the log table.

I do happen to TRUNCATE it *just before* issuing the procedure.
If I replace the "TRUNCATE log" with "DELETE FROM log", there is no failure on 
the log table.

With all invocation of the original problem, and for all occurances of 
described bug, there was no difference in the row data of original table & 
ghost table (I verified by avoiding dropping the original-renamed-to-arc table, 
and comparing max values with ghost-renamed-to-original table).
So my initial thought that this is a trigger failure appears ti be mistaken.

I now have an ugly suspicion, which I'm not sure how to prove, that a stored 
procedure does not like the fact a table has been truncated/renamed while (or 
just before?!? weird) the procedure is invoked.

I will try to build a test case which is entriely unrelated to 
oak-online-alter-table so as to support this claim.

Original comment by shlomi.n...@gmail.com on 25 Feb 2011 at 6:26

GoogleCodeExporter commented 8 years ago
It might be a bug in MySQL's locking.

Original comment by baron.schwartz on 28 Feb 2011 at 11:48

GoogleCodeExporter commented 8 years ago
Still unsure about how to prove/disprove this is a bug with MySQL (as I suspect 
it is).
Are you at all waiting on this or does work on mk-osc make this obsolete?

Original comment by shlomi.n...@gmail.com on 16 Mar 2011 at 6:31