Closed magdawutkowska closed 5 years ago
Is it possible your sequencing libraries are in both orientations (i.e. 5' -> 3' and 3' -> 5').
Hello, sorry if I post my question in the wrong thread. I am very new to bioinformatics. Right now I'm trying to analyse my data using DADA2 ITS Pipeline. My only reason for using this instead of DADA2 Pipeline is so that I can use cutadapt in R (I'm not familiar with running cutadapt in python). I will continue with the usual DADA2 Pipeline after removing the primers.
I have run the pipeline up to primer removal step and got the output for my first sample as
Forward Complement Reverse RevComp FWD.ForwardReads 0 0 0 0 FWD.ReverseReads 202 0 0 0 REV.ForwardReads 0 0 0 0 REV.ReverseReads 0 0 0 1
My initial output with filtN sample before primer removal for the same sample was
Forward Complement Reverse RevComp
FWD.ForwardReads 128137 0 0 0
FWD.ReverseReads 202 0 0 413
REV.ForwardReads 296 0 0 146
REV.ReverseReads 109811 0 0 1
This is different from the one shown in the ITS Pipeline workflow. Should I amend the code so that the Fwd. ReverseReads and Rev.ReverseReads be 0 as well? I am not sure if I am missing something here. Hope you can suggest the workaround.
Here are the details of my data: Illumina Miseq 2X300 paired end sequencing Sequence V6-V8 region Initial R1 and R2 fastq file with Illumina barcode and adapter removed primer set 926F (AAACTYAAAKGAATTGRCGG) 1392R (ACGGGCGGTGWGTRC)
Thanks for your time!
Is it possible your sequencing libraries are in both orientations (i.e. 5' -> 3' and 3' -> 5').
Thank you for your response! I think you are right. Is it possible to reorder the reads within dada2 pipeline or should I do that before after demultiplexing?
Is it possible to reorder the reads within dada2 pipeline or should I do that before after demultiplexing?
There isn't a full-fledged re-orientation solution in the dada2 package at this point. So if you can do that before (or after) demultiplexing, but before you start the dada2 pipeline, it would probably be best.
@afmbafmb Can you open a new issue (thread) on your problem?
Thank you @benjjneb!
Hi again! I am still a bit confused. The easiest way to redirect the PE reads for me is to merge them first (i.e. by using PEAR) and then reorient them using fgrep or similar tools. I know though that the core thing with dada2 is to process R1 and R2 reads separately, and then when they are 'resolved' merge them in one of the last steps.
This is an initial primer table from my subsampled data:
Forward Complement Reverse RevComp FWD.ForwardReads 113064 0 0 24300 FWD.ReverseReads 106609 0 0 15413 REV.ForwardReads 111728 0 0 24512 REV.ReverseReads 108476 0 0 14061
Whereas in your example:
Forward Complement Reverse RevComp FWD.ForwardReads 4214 0 0 0 FWD.ReverseReads 0 0 0 3743 REV.ForwardReads 0 0 0 3590 REV.ReverseReads 4200 0 0 0
And yes, my data consist of a mixture of amplicons starting with both forward and reverse primer, each ~50% in R1 and R2 files.
I found one solution, which is 'orient' function in usearch (https://www.drive5.com/usearch/manual/cmd_orient.html), which would align all the sequences in 5'-> 3' according to an attached database. Do you maybe have another, more straightforward suggestion of how to reorient them (I just want to make sure that I start dada2 with matching data)?
The easiest solution, and which does the least to break DADA2 assumptions, is to just use the data that is in the expected orientation.
I think I encountered a similar problem earlier in the ITS2 tutorial. I noticed that my R1 and R2 files each have forward and reverse reads in them at the cutadapt, demultiplexing stage. See description of the problem and workaround here <-
Hi Dr. Callahan! I have a quick, sort of stupid, question for you and I figured I would ask it here. I am running R1 ITS1 reads, and was wondering if I need to follow the DADA2 ITS Pipeline Workflow (1.8) first before following the normal DADA2 tutorial (in order to strip the primers).
I was looking through the other issues to try and find the answer to this question, but I couldn't seem to find anything with a clear answer.
Thank you for making such a fantastic pipeline and thank you for handling every issue directly!!
I am running R1 ITS1 reads, and was wondering if I need to follow the DADA2 ITS Pipeline Workflow (1.8) first before following the normal DADA2 tutorial (in order to strip the primers).
I would recommend following the ITS tutorial through to checking for the presence of the FWD and REV primers on your R1 reads.
If the REV primer is not on your reads, you can just go back to the simpler normal DADA2 tutorial. If the REV primer is on your reads, you will need to use the ITS/cutadapt tutorial that can remove primers at both the start and later in reads.
I am running R1 ITS1 reads, and was wondering if I need to follow the DADA2 ITS Pipeline Workflow (1.8) first before following the normal DADA2 tutorial (in order to strip the primers).
I would recommend following the ITS tutorial through to checking for the presence of the FWD and REV primers on your R1 reads.
If the REV primer is not on your reads, you can just go back to the simpler normal DADA2 tutorial. If the REV primer is on your reads, you will need to use the ITS/cutadapt tutorial that can remove primers at both the start and later in reads.
Hi Dr. Callahan, I started running my reads through the ITS tutorial as you suggested, and here is my output:
Clearly the F primers have been removed, but I am seeing the reverse compliment of my REV primer. I am wondering if this is a fluke due to the nature and randomness of the reads, or if it's actually identifying the reversed REV primer in my reads. What are your thoughts? Thank you again for answering these questions so quickly.
Clearly the F primers have been removed, but I am seeing the reverse compliment of my REV primer. I am wondering if this is a fluke due to the nature and randomness of the reads, or if it's actually identifying the reversed REV primer in my reads. What are your thoughts?
My thoughts are that your amplicons are in the rev-comp orientation of what you expected. That is fine, but it is important to know that.
To fully confirm, can you open up one of your R1 fastq files, and see if the reverse complement of your REV primer is at the start of the reads? library(dada2); rc(REV)
will give you the reverse-complement of the REV primer.
Clearly the F primers have been removed, but I am seeing the reverse compliment of my REV primer. I am wondering if this is a fluke due to the nature and randomness of the reads, or if it's actually identifying the reversed REV primer in my reads. What are your thoughts?
My thoughts are that your amplicons are in the rev-comp orientation of what you expected. That is fine, but it is important to know that.
To fully confirm, can you open up one of your R1 fastq files, and see if the reverse complement of your REV primer is at the start of the reads?
library(dada2); rc(REV)
will give you the reverse-complement of the REV primer.
Thank you for this. For whatever reason, I am finding the rcRev towards the end of some of my reads. This is an example from one of my sample fastq files (I just manually checked because I am continuing further with the ITS tutorial to see if I can remove them). What are your thoughts? Thank you again for your help with this.
OK, so that is normal then. That means your reads were longer than the ITS1 region in some amplicons (ITS1 varies a lot in length) and read into the reverse primer.
That means you will need to use cutadapt to remove the rev-comp of the REV primer from those reads though. Unfortunately the dada2 package only has basic primer removal functionality, i.e. removes primers that appear at the start of the reads. But cutadapt will have you covered.
OK, so that is normal then. That means your reads were longer than the ITS1 region in some amplicons (ITS1 varies a lot in length) and read into the reverse primer.
That means you will need to use cutadapt to remove the rev-comp of the REV primer from those reads though. Unfortunately the dada2 package only has basic primer removal functionality, i.e. removes primers that appear at the start of the reads. But cutadapt will have you covered.
Thank you so much for your help with this. I am sorry to keep asking questions, but I am struggling with the cutadapt step.
When I get to the for loop for running cutadapt, I get an error stating: cutadapt: error: You did not provide any input file names. Please give me something to do!
I looked at my fastq files that fall within the "filtN" directory, and they no longer look like fastq files. They must have been corrupted somehow in the filtering step? Thank you again for all of your help on this.
I looked at my fastq files that fall within the "filtN" directory, and they no longer look like fastq files. They must have been corrupted somehow in the filtering step? Thank you again for all of your help on this.
By default, dada2 commands compress all files using gzip. cutadapt isn't too smart about detecting that, it just uses file extensions, and is probably confusing itself at that point.
You could either turn off compression with filterAndTrim(..., compress=FALSE)
, or change the names of your filtered fastq files to have a .gz
at the end. If you aren't R comfortable, the first option is slightly easier.
I looked at my fastq files that fall within the "filtN" directory, and they no longer look like fastq files. They must have been corrupted somehow in the filtering step? Thank you again for all of your help on this.
By default, dada2 commands compress all files using gzip. cutadapt isn't too smart about detecting that, it just uses file extensions, and is probably confusing itself at that point.
You could either turn off compression with
filterAndTrim(..., compress=FALSE)
, or change the names of your filtered fastq files to have a.gz
at the end. If you aren't R comfortable, the first option is slightly easier.
I used the filterAndTrim command with compress=FALSE, but am still getting issues with cutadapt. I have a feeling it is my file names: we have samples from the same site at various depths, and here is an example file name: CTNA_30-40_S147_L001_R1_001.fastq
I think the "-" is causing problems, so I am going to rename the files and try again. Thank you so much for all of your help!!
So I have changed the file names, compress=FALSE...and I am still getting an error.
I have tried to change the loop to fit the fact that I am looking for the rc of the REV primer in only my R1 reads: for(i in seq_along(fnFs)) { system2(cutadapt, args = c("-g", FWD, "-b", REV.RC, "-o", fnFs.cut[i], # output files fnFs.filtN[i])) # input files }
I am following the structure based on cutadapt, but am getting an error for an invalid argument.
I am working with demulitplexed F reads of ITS1 sequences (Illumina MiSeq), and the primers used were the (forward) fITS7 and (reverse) ITS2 primers.
I am inserting my code below for your consideration: `setwd("/Volumes/HLF_SSD/Aronson_Lab_Data/XCZO_ITS1/xczo_fastq") getwd()
library("devtools")
packageVersion("dada2") # Should be 1.15.1 library(dada2)
library("ShortRead")
library("Biostrings") install.packages("beepr") library("beepr")
path <- "/Volumes/HLF_SSD/Aronson_Lab_Data/XCZO_ITS1/xczo_fastq" # CHANGE ME to the directory containing the fastq files after unzipping.
list.files(path)
fnFs <- sort(list.files(path, pattern="_R1_001.fastq", full.names = TRUE))
fnFs FWD <- "CTTGGTCATTTAGAGGAAGTAA" # ITS1f REV <- "GCTGCGTTCTTCATCGATGC" # ITS2 (its1 reverse primer)
allOrients <- function(primer) {
require(Biostrings) dna <- DNAString(primer) # The Biostrings works w/ DNAString objects rather than character vectors orients <- c(Forward = dna, Complement = complement(dna), Reverse = reverse(dna), RevComp = reverseComplement(dna)) return(sapply(orients, toString)) # Convert back to character vector } FWD.orients <- allOrients(FWD) REV.orients <- allOrients(REV) FWD.orients
fnFs.filtN <- file.path(path, "filtN", basename(fnFs)) # Put N-filterd files in filtN/ subdirectory filterAndTrim(fnFs, fnFs.filtN, maxN = 0, compress=FALSE) # Pre-filter seqs to remove Ns fnFs.filtN <- fnFs.filtN[file.exists(fnFs.filtN)] # Drops files that failed filter entirely and weren't written
primerHits <- function(primer, fn) {
nhits <- vcountPattern(primer, sread(readFastq(fn)), fixed = FALSE) return(sum(nhits > 0)) } rbind(FWD.ForwardReads = sapply(FWD.orients, primerHits, fn = fnFs.filtN[[1]]), REV.ForwardReads = sapply(REV.orients, primerHits, fn = fnFs.filtN[[1]]))
# ,
#REV.ReverseReads = sapply(REV.orients, primerHits, fn = fnRs.filtN[[1]]))
cutadapt <- "/Volumes/HLF_SSD/Aronson_Lab_Data/XCZO_ITS1/cutadapt" # CHANGE ME to the cutadapt path on your machine system2(cutadapt, args = "--version") # Run shell commands from R
path.cut <- file.path(path, "cutadapt") if(!dir.exists(path.cut)) dir.create(path.cut)
fnFs.cut <- file.path(path.cut, basename(fnFs.filtN))
FWD.RC <- dada2:::rc(FWD) REV.RC <- dada2:::rc(REV)
R1.flags <- paste("-g", FWD, "-b", REV.RC)
R2.flags <- paste("-G", REV, "-B", FWD.RC)
for(i in seq_along(fnFs)) { system2(cutadapt, args = c("-g", FWD, "-b", REV.RC, "-o", fnFs.cut[i], # output files fnFs.filtN[i])) # input files }
rbind(FWD.ForwardReads = sapply(FWD.orients, primerHits, fn = fnFs.cut[[1]]), FWD.ReverseReads = sapply(FWD.orients, primerHits, fn = fnRs.cut[[1]]))`
system2(cutadapt, args = c("-g", FWD, "-b", REV.RC
I think the "-b" you put here should be a "-a"?
And you can drop the "-g", FWD
part since the FWD primers don't need to be removed.
On the cutadapt user guide (https://cutadapt.readthedocs.io/en/stable/guide.html), there is a little table that explains what arguments can be used depending on where your adapter/primer sequence is within the reads. I used the "-b" argument so that it could find the REV.RC anywhere in the sequence rather than strictly in one position over the other.
I was actually able to run the cutadapt command via the terminal through all of my files, then improted them into R to follow the 1.12 tutorial - and I had success! I am still not sure why I couldn't get it to work in R, but I will try your suggestions (removing the -g, FWD) in the future - our lab is going to work together to follow your tutorial with some other reads we have. Thank you so much for your feedback!
I am trying dada2 for the first time and I am aiming to analyse ITS2 fungal sequences. I have demultiplexed my dataset first and followed initial steps of dada2 (https://benjjneb.github.io/dada2/ITS_workflow.html). In the end of the identify primers part of the pipeline in the table that gathers info about Forward Complement Reverse RevComp primers' sequences my table looks differently that one in the example. The forward primer and RevComp are both found in all 4 FWD.ForwardReads, FWD.ReverseReads, REV.ForwardReads and REV.ReverseReads. Thus I am not sure if I should reorient them just after demultiplexing, and then go for dada2 or can I reorient them in dada2? Thanks in advance for any ideas!