Clinical-Genomics / demultiplexing

To keep scripts associated with execution of the Illumina demultiplexing pipeline
5 stars 0 forks source link

Check for CopyComplete.txt to prevent duplicate reads in analysis #120

Closed barrystokman closed 3 years ago

barrystokman commented 3 years ago

If we start demultiplexing when the file CopyComplete.txt is present, we prevent the demultiplexing from starting before syncing is fully complete.

This prevents duplicate reads from showing up due to temporary bcl files still being present, but also prevents demultiplexing failing due to missing/corrupted bcl files caused by a lag in the syncing from sequencer -> NAS -> thalamus. Two birds with one stone.

RTAComplete.txt indicates that all the bcl have been created on the sequencer for a particular run CopyComplete.txt indicates that all data has been synced to the NAS from the sequencer, and the tmp bcl files have been removed.

Why did we only check for RTAComplete.txt?

1) the use of CopyComplete.txt was added as a feature at a later stage (only several years ago /s) 2) it was never a problem before to use RTAComplete.txt as a trigger to start demultiplexing.

Presumably something changed the transfer from sequencer -> NAS -> thalamus after the NAS upgrades.

How to prepare for test:

How to test:

Expected test outcome:

Review:

This version is a:

barrystokman commented 3 years ago

Bumped: image

Deployed: image