ebean-orm / ebean-migration

DB Migration runner (similar to Flyway) which can be used standalone or with Ebean (run migrations on EbeanServer start)
Apache License 2.0
9 stars 5 forks source link

DbMigration does not properly handle concurrent runs #107

Closed rPraml closed 2 years ago

rPraml commented 2 years ago

Hello @rbygrave

we would need some advice here:

We have a setup now, with multiple tomcats. Each of them runs the same code and accesses the same database (a common setup, which ebean should be aware of, with synced caches of course)

We noticed, that sometimes db-migration failed on a new instance with an "Table db-migration already exists". This happens, when both tomcats are started parallel and they try to run DbMigration at the same time.

We identified the non thread safe code: "if table does not exist, then create it and insert a row" as one problem. I don't know, how to make that "atomic" apart from checking "createTable" again after a failure. But this does not ensure, that the row is already committed, which is locked in the next step. For H2, row locking will only work for non empty tables. If the table is empty, obtainLockWithWait will not lock.

We could fix this for DB2 (and SqlServer), but it currently does not work for MariaDb and H2 (ok H2:mem-db is not really important, but other platforms should work)

Please see the test case, and the fixes. It fails most of the time for H2 and MariaDb. (TODO: Test other platforms like oracle also)

cheers Roland & Noemi

rbygrave commented 2 years ago

A friend just reported I think the same issue last night. I'll look shortly, I think I know where the issue is ...

rbygrave commented 2 years ago

Yes this is very close to correct. I think we additionally need to also check that the select for update locks at least 1 row and retry that if it didn't lock any rows. I think we need this in order to handle the time gap between creating the table and committing the inserted row - it's a short time gap but it isn't zero.