I am trying to use cutadapt for the first time to trim 300 paired-end fastq files generated using 16S rRNA gene amplicon sequencing of the V3-V4 regions of the 16S rRNA gene. I am doing this analysis on High performance computing cluster.
The V3-V4 regions of the 16S rRNA gene were amplified using a mixture of the universal bacterial primers 341F1–4 (5′ CCTACGGGNGGCWGCAG 3′) and 785R1–4 (5′ GACTACHVGGGTATCTAATCC 3′) with partial Illumina TruSeq adapter sequences added to the 5′ ends (F1; ATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT, F2; ATCTACACTCTTTCCCTACACGACGCTCTTCCGATCTgt, F3; ATCTACACTCTTTCCCTACACGACGCTCTTCCGATCTagag, F4; ATCTACACTCTTTCCCTACACGACGCTCTTCCGATCTtagtgt and R1; GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT, R2; GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCTa, R3; GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCTtct, R4; GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCTctgagtg).
cutadapt -a ADAPTER_FWD -A ADAPTER_REV -o out.1.fastq -p out.2.fastq reads.1.fastq reads.2.fastq
I made a sbatch file with cutadapt code for every 300 samples by putting the forward universal bacterial primer in the -a ADAPTER_FWD section and the reverse universal bacterial primer in the -A ADAPTER_REV section of the code. An example code of what I did is below:
cutadapt -a CCTACGGGNGGCWGCAG -A GACTACHVGGGTATCTAATCC -o sample1_trimmed_1.fastq -p sample1_trimmed_2.fastq sample1_1.fastq sample1_2.fastq
cutadapt -a CCTACGGGNGGCWGCAG -A GACTACHVGGGTATCTAATCC -o sample2_trimmed_1.fastq -p sample2_trimmed_2.fastq sample2_1.fastq sample2_2.fastq
........................... upto 300 samples.
Question 1) Am I correctly using cutadapt?, because I couldn't understand where should I insert the four partial Illumina TruSeq adapter sequences (F1, F2, F3, F4 & R1, R2, R3, R4) in the above code, I only used the forward & reverse universal bacterial primers to trim.
Question 2) How do I create a loop of sample names in the cutadapt code, so that I don't have to write the cutadapt code line 300 times in my sbatch script?
I am trying to use cutadapt for the first time to trim 300 paired-end fastq files generated using 16S rRNA gene amplicon sequencing of the V3-V4 regions of the 16S rRNA gene. I am doing this analysis on High performance computing cluster.
The V3-V4 regions of the 16S rRNA gene were amplified using a mixture of the universal bacterial primers 341F1–4 (5′ CCTACGGGNGGCWGCAG 3′) and 785R1–4 (5′ GACTACHVGGGTATCTAATCC 3′) with partial Illumina TruSeq adapter sequences added to the 5′ ends (F1; ATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT, F2; ATCTACACTCTTTCCCTACACGACGCTCTTCCGATCTgt, F3; ATCTACACTCTTTCCCTACACGACGCTCTTCCGATCTagag, F4; ATCTACACTCTTTCCCTACACGACGCTCTTCCGATCTtagtgt and R1; GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT, R2; GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCTa, R3; GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCTtct, R4; GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCTctgagtg).
So, I followed the cutadapt tutorial ( https://cutadapt.readthedocs.io/en/stable/guide.html#paired-end ) and used the following code from the tutorial to trim paired-end fastq reads:
cutadapt -a ADAPTER_FWD -A ADAPTER_REV -o out.1.fastq -p out.2.fastq reads.1.fastq reads.2.fastq
I made a sbatch file with cutadapt code for every 300 samples by putting the forward universal bacterial primer in the -a ADAPTER_FWD section and the reverse universal bacterial primer in the -A ADAPTER_REV section of the code. An example code of what I did is below:
cutadapt -a CCTACGGGNGGCWGCAG -A GACTACHVGGGTATCTAATCC -o sample1_trimmed_1.fastq -p sample1_trimmed_2.fastq sample1_1.fastq sample1_2.fastq
cutadapt -a CCTACGGGNGGCWGCAG -A GACTACHVGGGTATCTAATCC -o sample2_trimmed_1.fastq -p sample2_trimmed_2.fastq sample2_1.fastq sample2_2.fastq
........................... upto 300 samples.
Question 1) Am I correctly using cutadapt?, because I couldn't understand where should I insert the four partial Illumina TruSeq adapter sequences (F1, F2, F3, F4 & R1, R2, R3, R4) in the above code, I only used the forward & reverse universal bacterial primers to trim.
Question 2) How do I create a loop of sample names in the cutadapt code, so that I don't have to write the cutadapt code line 300 times in my sbatch script?
Thank you.