benjjneb / dada2

Accurate sample inference from amplicon data with single nucleotide resolution
http://benjjneb.github.io/dada2/
GNU Lesser General Public License v3.0
468 stars 142 forks source link

Need help in using cutadapt to trim paired-end fastq reads to provide as input for DADA2 #1905

Closed ghost closed 6 months ago

ghost commented 6 months ago

I am trying to use cutadapt for the first time to trim 300 paired-end fastq files generated using 16S rRNA gene amplicon sequencing of the V3-V4 regions of the 16S rRNA gene. I am doing this analysis on High performance computing cluster.

The V3-V4 regions of the 16S rRNA gene were amplified using a mixture of the universal bacterial primers 341F1–4 (5′ CCTACGGGNGGCWGCAG 3′) and 785R1–4 (5′ GACTACHVGGGTATCTAATCC 3′) with partial Illumina TruSeq adapter sequences added to the 5′ ends (F1; ATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT, F2; ATCTACACTCTTTCCCTACACGACGCTCTTCCGATCTgt, F3; ATCTACACTCTTTCCCTACACGACGCTCTTCCGATCTagag, F4; ATCTACACTCTTTCCCTACACGACGCTCTTCCGATCTtagtgt and R1; GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT, R2; GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCTa, R3; GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCTtct, R4; GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCTctgagtg).

So, I followed the cutadapt tutorial ( https://cutadapt.readthedocs.io/en/stable/guide.html#paired-end ) and used the following code from the tutorial to trim paired-end fastq reads:

cutadapt -a ADAPTER_FWD -A ADAPTER_REV -o out.1.fastq -p out.2.fastq reads.1.fastq reads.2.fastq

I made a sbatch file with cutadapt code for every 300 samples by putting the forward universal bacterial primer in the -a ADAPTER_FWD section and the reverse universal bacterial primer in the -A ADAPTER_REV section of the code. An example code of what I did is below:

cutadapt -a CCTACGGGNGGCWGCAG -A GACTACHVGGGTATCTAATCC -o sample1_trimmed_1.fastq -p sample1_trimmed_2.fastq sample1_1.fastq sample1_2.fastq

cutadapt -a CCTACGGGNGGCWGCAG -A GACTACHVGGGTATCTAATCC -o sample2_trimmed_1.fastq -p sample2_trimmed_2.fastq sample2_1.fastq sample2_2.fastq

........................... upto 300 samples.

Question 1) Am I correctly using cutadapt?, because I couldn't understand where should I insert the four partial Illumina TruSeq adapter sequences (F1, F2, F3, F4 & R1, R2, R3, R4) in the above code, I only used the forward & reverse universal bacterial primers to trim.

Question 2) How do I create a loop of sample names in the cutadapt code, so that I don't have to write the cutadapt code line 300 times in my sbatch script?

Thank you.

benjjneb commented 6 months ago

This question should be directed to cutadapt support.