NorwegianVeterinaryInstitute / DemultiplexRawSequenceData

A workflow automation script: demultiplex the library sequence, run quality checks, deliver to archiving and processing afterwards
GNU General Public License v3.0
1 stars 0 forks source link
bioinformatics bioinformatics-analysis bioinformatics-pipeline bioinformatics-scripts bioinformatics-tool fastqc multiqc python3

demultiplex_script.py

Demutliplex a MiSEQ or NextSEQ run, perform QC using FastQC and MultiQC and deliver files either to VIGASP for analysis or NIRD for archiving

Replace with relevant run id. Example : "190912_M06578_0001_000000000-CNNTP". RunID breaks down like this (date +%y%m%d/yymmdd_MACHINE-SERIAL-NUMBER_AUTOINCREASING-NUMBER-OF-RUN_000000000-FlowcellID-used-for-this-run .

Note: don't bother with enforcing ISO dates for the directory name. It is an Illumina standard and they do not care.

Software requirements

Python > v3.9
bcl2fastq ( from https://emea.support.illumina.com/sequencing/sequencing_software/bcl2fastq-conversion-software/downloads.html )
FastQC    ( https://www.bioinformatics.babraham.ac.uk/projects/fastqc/ )
MultiQC   ( pip3 install multiqc )

Directory structure on seqtech00

├── bin                                                         binaries and symlinks of binaries live here
├── clarity                                                     exported Illumina Clarity directory
│   ├── gls_events  
│   ├── logs                                                    clarity logs go here
│   ├── miseq                                                   miseq-clarity stopover directory
│   │   └── M06578                                              per serial number
│   │       └── samplesheets                                    samplesheets for this serial number gohere
│   └── nextseq                                                 nextseq-clarity stopover directory
│       └── NB552450                                            per serial number
│           └── samplesheets                                    samplesheets for this serial number gohere
├── demultiplex                                                 demultiplexed data directory
├── for_transfer                                                data ready to be transfered over to NIRD or VIGASP
├── logs                                                        all demultiplexing logs go here
├── rawdata                                                     raw data directory, sequencers write here
│   ├── bad_runs                                                runs which are bad, or rejected
│   └── control_runs                                            water/other control runs
└── samplesheets                                                cummulative backups of all samplesheets
├── M06578 -> /data/clarity/miseq/M06578/samplesheets/          symlinmk to sample sheets for convinience
└── NB552450 -> /data/clarity/nextseq/NB552450/samplesheets/    symlinmk to sample sheets for convinience

Procedure

Example:

Z:\190912_M06578_0001_000000000-CNNTP
        ├── SampleSheets.csv
Z:\SampleSheets
        ├── 190912_M06578_0001_000000000-CNNTP.csv

as the relevant user.