Closed lbeltrame closed 5 years ago
It looks like these are all spawned at the same time, so they'll be run in separate tmp directories and whichever one finishes last will end up being the one that sticks. I think this is probably resistant to race conditions, though they definitely could happen. We don't have a great way to handle stuff like this where a bunch of different threads need to make the same file that doesn't exist yet, since the threads don't talk to each other at all.
Whoops, I meant to cancel the comment, and instead it sent something unintelligible. Sorry about that.
No worries-- thanks. Let us know if this actually creates a race condition and we can look into seeing how we can avoid this kind of thing.
There are actually races, I have errors like:
AssertionError: distributed.transaction.file_transaction: File copy error: file or directory on temporary storage (/scratch/tmpKof9oR/Agilent_OneSeq_Backbone_Covered.bed.gz) size 2870089 bytes does not equal size of file or directory after transfer to shared storage (/data/bioinformatics/einar/proj/work/bedprep/Agilent_OneSeq_Backbone_Covered.bed.gz) size 163840 bytes
Or
IOError: [Errno 17] File exists: '/data/bioinformatics/einar/proj/work/bedprep/Agilent_OneSeq_Backbone_Covered.bed.gz'
Luca -- sorry about the issues and thanks for the detailed reports. Which part of the pipeline run are you seeing these? We added in prep_samples
which is meant to resolve this by pre-preparing BED files for all of the samples. It sounds like we might be missing some inputs from that which trigger the problem downstream:
Thanks much for the help debugging this.
Closing as this is stale, and looks like it might have been fixed. Please reopen if not.
While looking at the logs, I spotted this
These are all the same file (I'm using one set in the
globals
section). This would be harmless but causes a race when doing transactions, because thebcbiotmp
file might get removed under the feet of other processes runnning, or other similar sympthoms.I'm not sure if it's easily fixable since every sample chan have a separate bed file for
variant_regions
or SV regions.