NorwegianVeterinaryInstitute/DemultiplexRawSequenceData

demultiplex_script.py

Demutliplex a MiSEQ or NextSEQ run, perform QC using FastQC and MultiQC and deliver files either to VIGASP for analysis or NIRD for archiving

Replace with relevant run id. Example : "190912_M06578_0001_000000000-CNNTP". RunID breaks down like this (date +%y%m%d/yymmdd_MACHINE-SERIAL-NUMBER_AUTOINCREASING-NUMBER-OF-RUN_000000000-FlowcellID-used-for-this-run .

Note: don't bother with enforcing ISO dates for the directory name. It is an Illumina standard and they do not care.

Software requirements

Python > v3.9
bcl2fastq ( from https://emea.support.illumina.com/sequencing/sequencing_software/bcl2fastq-conversion-software/downloads.html )
FastQC    ( https://www.bioinformatics.babraham.ac.uk/projects/fastqc/ )
MultiQC   ( pip3 install multiqc )

Directory structure on seqtech00

├── bin                                                         binaries and symlinks of binaries live here
├── clarity                                                     exported Illumina Clarity directory
│   ├── gls_events  
│   ├── logs                                                    clarity logs go here
│   ├── miseq                                                   miseq-clarity stopover directory
│   │   └── M06578                                              per serial number
│   │       └── samplesheets                                    samplesheets for this serial number gohere
│   └── nextseq                                                 nextseq-clarity stopover directory
│       └── NB552450                                            per serial number
│           └── samplesheets                                    samplesheets for this serial number gohere
├── demultiplex                                                 demultiplexed data directory
├── for_transfer                                                data ready to be transfered over to NIRD or VIGASP
├── logs                                                        all demultiplexing logs go here
├── rawdata                                                     raw data directory, sequencers write here
│   ├── bad_runs                                                runs which are bad, or rejected
│   └── control_runs                                            water/other control runs
└── samplesheets                                                cummulative backups of all samplesheets
├── M06578 -> /data/clarity/miseq/M06578/samplesheets/          symlinmk to sample sheets for convinience
└── NB552450 -> /data/clarity/nextseq/NB552450/samplesheets/    symlinmk to sample sheets for convinience

Procedure

MiSeq writes as MiSEQ- to /data/scratch; shared folder Z:\ (alias rawdata) in MiSeq
Lab members modify an existing SampleSheet.csv file to include the new project data, then save the new file to the \<RunId> folder in Z:\ and a copy within Z:\SampleSheets\ as \<RunId>\SampleSheet.csv

Example:

Z:\190912_M06578_0001_000000000-CNNTP
        ├── SampleSheets.csv
Z:\SampleSheets
        ├── 190912_M06578_0001_000000000-CNNTP.csv

Cron job runs every 30 minutes if it finds a new run, RTAComplete.txt and SampleSheet.csv files within the run new, it starts the demultiplexing script

It can be manually started as below

/usr/bin/python3 /data/bin/demultiplex_script.py \<RunID\>

as the relevant user.

NorwegianVeterinaryInstitute / DemultiplexRawSequenceData

readme

demultiplex_script.py

Directory structure on seqtech00

Procedure