benjjneb / dada2

Accurate sample inference from amplicon data with single nucleotide resolution
http://benjjneb.github.io/dada2/
GNU Lesser General Public License v3.0
459 stars 142 forks source link

Running Ion Torrent fastq on dada2 in R #384

Closed HarryBatsang closed 6 years ago

HarryBatsang commented 6 years ago

Hello @benjjneb, I am using R to run dada2 to denoise my 16S rRNA fastq from Ion Torrent PGM. I am wondering whether there is any dada2 tutorial for PGM? Even though dada2 support IT, but all the command on tutorial seems not fit for IT, I get stuck from filterAndTrim: because IT is single-end seqs, I just type fnFs and filtFs, and change other things accordingly. after I hit the command for filterAndTrim, it always gives me the errors of : Errors in filterAndTrim....., all output files must be distinct, but not sure where I did wrong.

Considering the single-end character of IT seqs, I even feel more confused on later steps like pairing the seqs... how should I revise the tutorial to make it fit for Ion Torrent?

Thank you all so much!

benjjneb commented 6 years ago

The tutorial workflow for single-end data is mostly the same, but you just leave out the merging part and all mentions of the reverse reads. Some examples:

For filterAndTrim that looks like:

outF <- filterAndTrim(fnFs, filtFs, truncLen=240,
              maxN=0, maxEE=2, truncQ=2, rm.phix=TRUE,
              compress=TRUE, multithread=TRUE) # On Windows set multithread=FALSE
head(outF)

Then later, just skip merging, and make the sequence table from dadaFs:

seqtab <- makeSequenceTable(dadaFs)
betogracida commented 5 years ago

Hi there, I have the same problem with the same error: all output files must be distinct, but I am working with F and R sequences. Can you help me to fix it?

Thanks in advance for your reply.

benjjneb commented 5 years ago

@betogracida You need to provide enough information that we can understand what you are doing. Are you using Illumina paired-end sequencing? What is the exact command you are running that produces this error? Are there any duplicated names between your filtFs and filtRs filepaths?

betogracida commented 5 years ago

Hi Benjamin, Thanks for your reply. Yes, I am using Illumina paired-end sequencing? The exact command I am running is :

out <- filterAndTrim (fnFs, filtFs, fnRs, filtRs, truncLen=c(120,114), maxN=0, maxEE=c(2,5), truncQ=2, rm.phix=TRUE, compress=TRUE, multithread=TRUE, matchIDs=TRUE)

Are there any duplicated names between your filtFs and filtRs filepaths? Here are the file paths where I aim to save the outputs.

filtFs <- file.path("/Volumes/FASTA_FILES/FASTA_FILES/S1_R1_R2_SENSE/DEMULTIPLEXED_SENSE/TRIMMED_SENSE_FILES/TRIMMED_SENSE_fastq/", "filtered", paste0(sample.names, "_F_filt.fastq.gz")) filtRs <- file.path("/Volumes/FASTA_FILES/FASTA_FILES/S1_R1_R2_SENSE/DEMULTIPLEXED_SENSE/TRIMMED_SENSE_FILES/TRIMMED_SENSE_fastq/", "filtered", paste0(sample.namesRs, "_R_filt.fastq.gz"))

benjjneb commented 5 years ago

My first concern is here: truncLen=c(120,114)

That is a total of 234 nts that are left after truncation. Is that long enough for these reads to overlap for later merging? In other words, is the length of your sequenced amplicon expected to be less than 234-12 ~ 220 nucleotides?

Second, filterAndTrim is throwing that error because it thinks you have duplicated filenames between filtFs and filtRs. What do you get from the following?

any(duplicated(sample.names))
any(duplicated(sample.namesRs))
any(duplicated(filtFs))
any(duplicated(filtRs))
betogracida commented 5 years ago

Hi, Ben, I changes those values because those are the sizes of the amplicons I have. Is correct that assumption?

And once I run the commands you mention I get this:

any(duplicated(sample.names)) [1] TRUE any(duplicated(sample.namesRs)) [1] TRUE any(duplicated(filtFs)) [1] TRUE any(duplicated(filtRs)) [1] TRUE

Means I have files with the same name?

benjjneb commented 5 years ago

Means I have files with the same name?

Yes, you need to figure out what is causing the duplicated sample naems and rectify that issue. A common problem is that your sample names have a character (e.g. _) in them that is used in the string parsing expression, thus you are losing the second part of the sample name.

Hi, Ben, I changes those values because those are the sizes of the amplicons I have. Is correct that assumption?

As long as your amplicon is 220nts or less, you are good to go, I just wanted to check on that as one of the common mistakes is to truncate reads so short they fail to overlap later.

betogracida commented 5 years ago

Hi again Ben, I did not find any duplicated file name, I have changed the length in the number of base pairs in the command according to what I have and it continues showing the same answer: all output files must be distinct. I am sharing a copy of the list with my file names in case you want to check or if you see something I don't. Cheers. files_names.docx

benjjneb commented 5 years ago

I'm not sure how the word document is created, but R is detecting duplicated sample names. What is the output when of: print(sample.names)?

betogracida commented 5 years ago

[1] "/Volumes/FASTA" "/Volumes/FASTA" "/Volumes/FASTA" "/Volumes/FASTA" "/Volumes/FASTA" [6] "/Volumes/FASTA" "/Volumes/FASTA" "/Volumes/FASTA" "/Volumes/FASTA" "/Volumes/FASTA" [11] "/Volumes/FASTA" "/Volumes/FASTA" "/Volumes/FASTA" "/Volumes/FASTA" "/Volumes/FASTA" [16] "/Volumes/FASTA" "/Volumes/FASTA" "/Volumes/FASTA" "/Volumes/FASTA" "/Volumes/FASTA" [21] "/Volumes/FASTA" "/Volumes/FASTA" "/Volumes/FASTA" "/Volumes/FASTA" "/Volumes/FASTA" [26] "/Volumes/FASTA" "/Volumes/FASTA" "/Volumes/FASTA" "/Volumes/FASTA" "/Volumes/FASTA" [31] "/Volumes/FASTA" "/Volumes/FASTA" "/Volumes/FASTA" "/Volumes/FASTA" "/Volumes/FASTA" [36] "/Volumes/FASTA" "/Volumes/FASTA" "/Volumes/FASTA" "/Volumes/FASTA" "/Volumes/FASTA" [41] "/Volumes/FASTA" "/Volumes/FASTA" "/Volumes/FASTA" "/Volumes/FASTA" "/Volumes/FASTA" [46] "/Volumes/FASTA" "/Volumes/FASTA" "/Volumes/FASTA" "/Volumes/FASTA" "/Volumes/FASTA" [51] "/Volumes/FASTA" "/Volumes/FASTA" "/Volumes/FASTA" "/Volumes/FASTA" "/Volumes/FASTA" [56] "/Volumes/FASTA" "/Volumes/FASTA" "/Volumes/FASTA" "/Volumes/FASTA" "/Volumes/FASTA" [61] "/Volumes/FASTA" "/Volumes/FASTA" "/Volumes/FASTA" "/Volumes/FASTA" "/Volumes/FASTA" [66] "/Volumes/FASTA" "/Volumes/FASTA" "/Volumes/FASTA" "/Volumes/FASTA" "/Volumes/FASTA" [71] "/Volumes/FASTA" "/Volumes/FASTA" "/Volumes/FASTA" "/Volumes/FASTA" "/Volumes/FASTA" [76] "/Volumes/FASTA" "/Volumes/FASTA" "/Volumes/FASTA" "/Volumes/FASTA" "/Volumes/FASTA" [81] "/Volumes/FASTA" "/Volumes/FASTA"

benjjneb commented 5 years ago

There's the problem, all of your sample.names are the same, and surely not what you want them to be. You'll need to fix the code that defines sample.names, and that should solve your problem.