EMCECS / ecs-sync

ecs-sync is a bulk copy utility that can move data between various systems in parallel
Apache License 2.0
61 stars 22 forks source link

Duplicates in source-list-file cause confusing errors in UI #18

Closed twincitiesguy closed 6 years ago

twincitiesguy commented 7 years ago

During a migration, a user encountered errors and skipped objects, but saw no errors in the errors report nor in the database. This caused confusion and reduced confidence that all data was transferred.

In this particular case, there were duplicate entries in the source list file, which caused some objects to be skipped (that were already copied in the same job). The UI reports on every line of the list file (including duplicates), whereas the DB only tracks unique objects. This explains the discrepancy between the UI and the database and is expected.

However, errors are not expected in this case. It turns out there is a race condition where two threads try to insert a record into the DB at exactly the same time. In that situation, one thread will fail, while the other will succeed. The winning thread copies the data and records its results in the database, while the losing thread does not. That's why no errors appear in the DB. However, the net result is that those objects were copied successfully.

This bug is to address the race condition and eliminate the errors. Duplicate entries in the source list will still cause skipped objects, but this is expected behavior and will be added to the troubleshooting guide.

twincitiesguy commented 6 years ago

Added locking around source identifier in the database service, so that no two threads can process the same source id at the same time.

twincitiesguy commented 6 years ago

Fixed in 3.2.4