bcbio / bcbio-nextgen

Validated, scalable, community developed variant calling, RNA-seq and small RNA analysis
https://bcbio-nextgen.readthedocs.io
MIT License
986 stars 354 forks source link

Tabix index for the same region is run multiple times #2306

Closed lbeltrame closed 5 years ago

lbeltrame commented 6 years ago

While looking at the logs, I spotted this

[2018-03-05T10:18Z] hibari: tabix index Agilent_OneSeq_Backbone_Covered.bed.gz
[2018-03-05T10:18Z] suzume: tabix index Agilent_OneSeq_Backbone_Covered.bed.gz
[2018-03-05T10:18Z] suzume: tabix index Agilent_OneSeq_Backbone_Covered.bed.gz
[2018-03-05T10:18Z] rory: tabix index Agilent_OneSeq_Backbone_Covered.bed.gz
[2018-03-05T10:18Z] kyokai: tabix index Agilent_OneSeq_Backbone_Covered.bed.gz
[2018-03-05T10:18Z] teru: tabix index Agilent_OneSeq_Backbone_Covered.bed.gz
[2018-03-05T10:18Z] kuroki: tabix index Agilent_OneSeq_Backbone_Covered.bed.gz
[2018-03-05T10:18Z] kyoko: tabix index Agilent_OneSeq_Backbone_Covered.bed.gz
[2018-03-05T10:18Z] lelei: tabix index Agilent_OneSeq_Backbone_Covered.bed.gz
[2018-03-05T10:18Z] sena: tabix index Agilent_OneSeq_Backbone_Covered.bed.gz
[2018-03-05T10:18Z] sonobe: tabix index Agilent_OneSeq_Backbone_Covered.bed.gz

These are all the same file (I'm using one set in the globals section). This would be harmless but causes a race when doing transactions, because the bcbiotmp file might get removed under the feet of other processes runnning, or other similar sympthoms.

I'm not sure if it's easily fixable since every sample chan have a separate bed file for variant_regions or SV regions.

roryk commented 6 years ago

It looks like these are all spawned at the same time, so they'll be run in separate tmp directories and whichever one finishes last will end up being the one that sticks. I think this is probably resistant to race conditions, though they definitely could happen. We don't have a great way to handle stuff like this where a bunch of different threads need to make the same file that doesn't exist yet, since the threads don't talk to each other at all.

lbeltrame commented 6 years ago

Whoops, I meant to cancel the comment, and instead it sent something unintelligible. Sorry about that.

roryk commented 6 years ago

No worries-- thanks. Let us know if this actually creates a race condition and we can look into seeing how we can avoid this kind of thing.

lbeltrame commented 6 years ago

There are actually races, I have errors like:

AssertionError: distributed.transaction.file_transaction: File copy error: file or directory on temporary storage (/scratch/tmpKof9oR/Agilent_OneSeq_Backbone_Covered.bed.gz) size 2870089 bytes does not equal size of file or directory after transfer to shared storage (/data/bioinformatics/einar/proj/work/bedprep/Agilent_OneSeq_Backbone_Covered.bed.gz) size 163840 bytes

Or

IOError: [Errno 17] File exists: '/data/bioinformatics/einar/proj/work/bedprep/Agilent_OneSeq_Backbone_Covered.bed.gz'
chapmanb commented 6 years ago

Luca -- sorry about the issues and thanks for the detailed reports. Which part of the pipeline run are you seeing these? We added in prep_samples which is meant to resolve this by pre-preparing BED files for all of the samples. It sounds like we might be missing some inputs from that which trigger the problem downstream:

https://github.com/chapmanb/bcbio-nextgen/blob/f8369d0dd8b557fff4c9d258f135937e6c513ca0/bcbio/pipeline/sample.py#L195

Thanks much for the help debugging this.

roryk commented 5 years ago

Closing as this is stale, and looks like it might have been fixed. Please reopen if not.