Open bfremin opened 5 years ago
It would be extremely useful to incorporate this into this workflow in some automated fashion
Yeah I can try something. It is only 2 commands though.
if you feel like tackling this, by all means do it and submit a pull request. it'll need the dependency taken care of with either conda or a container, and the new input will have to be integrated into the config, workflow and docs
We have been getting data back as a giant fastq file of undetermined reads (instead of bcl) with the barcode in the read name. Most tools that demultiplex from fastq were very slow, could not be parallelized, and/or failed. This is just a pre-preprocessing tip.
You need two files (a file that lists your barcodes, and a script)
barcodes.txt: samplenameA GGACTCCT+AGAGGATA samplenameB TAGGCATG+AGAGGATA samplenameC CTCTCTAC+AGAGGATA ...all your samples
demultiplex.sh
!/bin/bash
module load sickle/1.33
demultiplex samples
grep -A3 --no-group-separator -i $2 {giant_UndeterminedFile_1.fq} | gzip > $1_1.fq.gz & grep -A3 --no-group-separator -i $2 {giant_UndeterminedFile_2.fq} | gzip > $1_2.fq.gz & wait
remove instances that do not have pairs (trimming will fail if you do not)
sickle pe -f $1_1.fq.gz -r $12.fq.gz -t sanger -o paired$11.fq -p paired$1_2.fq -s $1_single.fq
Run: cat barcodes.txt | xargs -l bash -c 'sbatch ..... demultiplex.sh $0 $1'
Will save you a lot of time instead of trying existing tools.