Clinical-Genomics / demultiplexing

To keep scripts associated with execution of the Illumina demultiplexing pipeline
5 stars 0 forks source link

Exclude Undetermined files from rsync #87

Closed ingkebil closed 5 years ago

ingkebil commented 5 years ago

This PR removes syncing of Undetermined files from thalamus to hasta.

How to test on thalamus:

  1. On hasta: rm /home/proj/stage/demultiplexed-runs/190208_A00689_0009_AHHMGWDSXX/Unaligned-Y151I10I10Y151/Undetermined*

On thalamus:

  1. install on locally: cd ~/git/yourname/; git clone git+https://github.com/clinical-genomics/demultiplexing@exclude
  2. remove the copycomplete.txt files: rm ~/STAGE/novaseq/demux/*/copycomplete.txt
  3. run following command: bash /home/hiseq.clinical/git/yourname/demultiplexing/scripts/2-hiseq-deliver.bash /home/hiseq.clinical/STAGE/novaseq/demux/ hasta.scilifelab.se /home/proj/stage/demultiplexed-runs/

Expected outcome: After a few hours check hasta.scilifelab.se:/home/proj/stage/demultiplexed-runs/. Following runs should be synced:

Neither of these runs should contain Undetermined files in Unaligned-*.

Review:

This is patch version bump because there is no change in functionality.

ingkebil commented 5 years ago

@emiliaol We could just skip syncing of Undetermined fastq files from thalamus hopefully giving us a minor speedbump. Any reason we shouldn't be doing that? :)

ingkebil commented 5 years ago
[hiseq.clinical@thalamus demultiplexing]$ rm ~/STAGE/novaseq/demux/*/copycomplete.txt
[hiseq.clinical@thalamus demultiplexing]$ bash /home/hiseq.clinical/git/kenny/demultiplexing/scripts/2-hiseq-deliver.bash /home/hiseq.clinical/STAGE/novaseq/demux/ hasta.scilifelab.se /home/proj/stage/demultiplexed-runs/
[20190925150129] 190129_A00689_0008_BHHGYWDSXX not finished
[20190925150129] rsync -rt --progress --exclude=copycomplete.txt --exclude='Undetermined*' /home/hiseq.clinical/STAGE/novaseq/demux//190208_A00689_0009_AHHMGWDSXX hasta.scilifelab.se:/home/proj/stage/demultiplexed-runs/
sending incremental file list
190208_A00689_0009_AHHMGWDSXX/
190208_A00689_0009_AHHMGWDSXX/Unaligned-Y151I10I10Y151/

sent 26748 bytes  received 352 bytes  18066.67 bytes/sec
total size is 1550869155000  speedup is 57227644.10
[20190925150130] scp /home/hiseq.clinical/STAGE/novaseq/demux//190208_A00689_0009_AHHMGWDSXX/copycomplete.txt hasta.scilifelab.se:/home/proj/stage/demultiplexed-runs//190208_A00689_0009_AHHMGWDSXX/
copycomplete.txt                                                                                            100%   15     0.0KB/s   00:00    
[20190925150131] ssh hasta.scilifelab.se 'rm /home/proj/stage/demultiplexed-runs//190208_A00689_0009_AHHMGWDSXX/delivery.txt'

:checkered_flag:

ingkebil commented 5 years ago

@emiliaol Any reason we shouldn't exclude Undetermined files from a Novaseq? If no reason: let's deploy this! :)

ingkebil commented 5 years ago

@emiliaol approves as well