DatabaseCleaner / database_cleaner-active_record

Strategies for cleaning databases using ActiveRecord. Can be used to ensure a clean state for testing.
MIT License
64 stars 63 forks source link

ActiveRecord Mysql2 SelectiveTruncation method is unreliable #23

Open graaff opened 9 years ago

graaff commented 9 years ago

The SelectiveTruncation strategy (where database_cleaner looks at "table_rows" in the information_schema) is always applied with a combination of ActiveRecord and Mysql2 adapter. Unfortunately this strategy is unreliable because table_rows is not guaranteed to be accurate. I'm using MariaDB 10.0.16 and I currently have a cucumber javascript scenario where this happens fairly predictably (can't share that in it current state, sorry).

The MariaDB documentation indicates that this information is not reliable with InnoDB tables: https://mariadb.com/kb/en/mariadb/information-schema-tables-table/

The same thing is documented for MySQL 5.6: http://dev.mysql.com/doc/refman/5.6/en/tables-table.html

Looking at information_schema for that particular test database I do notice that AUTO_INCREMENT > 1 for the table that has data and TABLE_ROWS = 0. However, I also see an AUTO_INCREMENT > 1 for tables that (no longer) have data in them.

Perhaps it might be enough to use ... AND (table_rows > 0 OR auto_increment > 1) to clean only those tables that still have or at least had data in them. Otherwise it might be best to just turn this selectivetrunction off, or at least provide an option for that.

etagwerker commented 9 years ago

@graaff It's a good point.

We have a method to check for row count. If that doesn't work we use AUTO_INCREMENT. See: https://github.com/DatabaseCleaner/database_cleaner/blob/master/lib/database_cleaner/active_record/truncation.rb#L70

In the scenario that you're describing, row_count is returning 0?

graaff commented 9 years ago

I see now that I forgot to mention that I am using the :deletion strategy. There the SelectiveTruncation method uses the information_schema to determine row_count. The row_count method in truncation.rb looks like it should work since it uses plain SQL, rather than internal MySQL information.

carsonreinke commented 9 years ago

Also, it should be noted that MySQL states that table_rows: "For InnoDB tables, the row count is only a rough estimate used in SQL optimization." [http://dev.mysql.com/doc/refman/5.5/en/tables-table.html]

etagwerker commented 9 years ago

@graaff Would you mind submitting a PR for this? It'd be great if we can use the same way to calculate row_count for both deletion and truncation strategies. I don't understand why they're different right now.

lupine commented 6 years ago

I've just experienced this issue while porting GitLab's spec suite from the truncation strategy to the deletion strategy: https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/16516#note_55605767

If there's still interest in a PR to fix this upstream, I'm happy to work on it so we can remove the above monkey-patch!

mauro-oto commented 6 years ago

@lupine awesome, yes, we're interested. Thanks!