Saskia-Oosterbroek / decona

fastq to polished sequenses: pipeline suitable for mixed samples and long (Nanopore) reads
MIT License
41 stars 12 forks source link

use primer information to rotate sequences? #33

Open Kirk3gaard opened 2 years ago

Kirk3gaard commented 2 years ago

Hi Saskia

I tried to run your tool and it seems to do a really good job (even with just racon #31). The sequences clearly have better BLAST hits than the raw reads!

Related to #20 I came to think about whether it would be useful to have the primer sequences included to make sure that the output sequences would be in the expected/same orientation.

Custom primer trimming might also be a great addition to your workflow as I expect many people will use your approach with a broad selection of primer combinations.

Best regards Rasmus

Kirk3gaard commented 2 years ago

Code for doing something like this using cutadapt could be: INPUTREADS=reads.fastq THREADS=10 FPRIMER="AGRRTTYGATYHTDGYTYAG" RPRIMER="YCNTTCCYTYDYRGTACT" MIN_LENGTH=1500 MAX_LENGTH=4000 ERRORRATE=0.1

cutadapt \ --cores $THREADS \ --untrimmed-output $TMPDIR/untrimmed.fastq \ -e $ERRORRATE \ -m $MIN_LENGTH \ -M $MAX_LENGTH \ -a $FPRIMER...$RPRIMER \ $INPUTREADS > data/$NAME.trimmed.fastq

Reverse/complement untrimmed and trim again

seqtk seq -r $TMPDIR/untrimmed.fastq |\ cutadapt \ --cores $THREADS \ --untrimmed-output $TMPDIR/untrimmed2.fastq \ -e $ERRORRATE \ -m $MIN_LENGTH \ -M $MAX_LENGTH \ -a $FPRIMER...$RPRIMER - >> data/$NAME.trimmed.fastq

Saskia-Oosterbroek commented 2 years ago

Thanks Rasmus!

I'll be working on this :)