Closed kdund closed 4 years ago
I think this is a better approach than I had in mind first: Short summary: this will allow the concurrent jobs to write to a (temp) file, which is then renamed in a manner so that only the whole file is written to the system-- so that no race condition occurs. (only fault source will then be if the last run is faulty somehow) https://stackoverflow.com/questions/12003805/threadsafe-and-fault-tolerant-file-writes
So trying to reproduce race conditions is (perhaps predictably) challenging. I will start running with this fix and no "burn-in". Changing the "open" to atomic write( df60ea6f6fe44a617d019f3e90a9ab421794f96d) will ensure that no two jobs write to the same cache file. It might be overwritten if two jobs both realise they need a non-existent cached file, but in the end, the slower of them will overwrite the entire file.
Sounds good! Thanks for pointing to the atomicwrites package , seems a lot better than making a custom temporary file + renaming solution.
Proposed solution in #30
closed with #30
When starting multiple batch jobs on Midway, it is necessary to run a "burn-in" run beforehand to avoid multiple jobs attempting to write to a cache file, corrupting it. Deleting and re-run is then required.