benjjneb / dada2

Accurate sample inference from amplicon data with single nucleotide resolution
http://benjjneb.github.io/dada2/
GNU Lesser General Public License v3.0
459 stars 142 forks source link

error in evaluating the argument 'x' in selecting a method for function 'reverseComplement': key 69 (char 'E') not in lookup table #1417

Closed amandarwills closed 3 months ago

amandarwills commented 2 years ago

Hello!

After the remove chimeras step in DADA2 v1.2, I'm getting the following errors in my log output. Any help would be greatly appreciated!

     abundance forward reverse nmatch nmismatch nindel prefer accept
828         22     190      94     27         0      0      2   TRUE
1951         9     190     625     27         0      0      2   TRUE
2421         7     190     543     27         0      0      2   TRUE
2989         5     190     516     27         0      0      2   TRUE
4660         2     190     544     27         0      0      2   TRUE
[1]  82 167

150 166 169 177 184 185 186 190 198 200 203 208 228 247 253 254 268 
  2   9   7   2   3   1  20   1   1   1   1   1  42   1  72   2   1 
Identified 47 bimeras out of 167 input sequences.
[1]  82 120
[1] 0.9119597
**Error in h(simpleError(msg, call)) : 
  error in evaluating the argument 'x' in selecting a method for function 'reverseComplement': key 69 (char 'E') not in lookup table**
Calls: assignTaxonomy ... make_XString_from_string -> .Call2 -> .handleSimpleError -> h
**In addition: Warning message:
In assignTaxonomy("seqtab.nochim", "/scratch/amanda/3TP_Dataset_2019/Amplicon_Data_All_Genotypes/Renamed_Files/rdp_train_set_18.fa") :
  Some sequences were shorter than 50 nts and will not receive a taxonomic classification.
Execution halted**
benjjneb commented 2 years ago

It is difficult to diagnose what is going on without knowing the command that is throwing the error. What was being reverse complemented? It can't be DADA2 ASVs, as they cannot contain E characters.

You can ignore the warning though, it's just a warning for your information -- the command is still running fine.

amandarwills commented 2 years ago
taxa <- assignTaxonomy('seqtab.nochim', "~/Desktop/2019_Amplicon_Data/silva_nr99_v138.1_train_set.fa")
Error in h(simpleError(msg, call)) : 
  error in evaluating the argument 'x' in selecting a method for function 'reverseComplement': key 69 (char 'E') not in lookup table
In addition: Warning message:
In assignTaxonomy("seqtab.nochim", "~/Desktop/2019_Amplicon_Data/silva_nr99_v138.1_train_set.fa") :
  Some sequences were shorter than 50 nts and will not receive a taxonomic classification.
> unname(head(taxa))

The seqtab.nochim looks like the following, so there is no "Mock" rowname, which I think is another problem. Nothing works past this point. All figures are blank and every other command receives an error.

Screen Shot 2021-09-29 at 6 29 20 PM

.

benjjneb commented 2 years ago

If you drop the single quotes, does it work?

taxa <- assignTaxonomy(seqtab.nochim, "~/Desktop/2019_Amplicon_Data/silva_nr99_v138.1_train_set.fa")

(i.e. no single quotes around seqtab.nochim, which R understands as a string rather than a variable name)

amandarwills commented 2 years ago

Yes! That worked, but now I'm getting another error. I found another post about this error, but I don't think it applies to me fully. Or it could be I just don't understand, but below is the error I'm getting, with the full code to that point. The only thing I can think of is that I am not including mock data?

> path <- "~/Desktop/2019_Amplicon_Data"
> fns <- list.files(path)
> fns
 [1] "MC_206_T1A_1477_R1_001.fastq"       "MC_206_T1A_1477_R2_001.fastq"      
 [3] "MC_206_Wild_1_R1_001.fastq"         "MC_206_Wild_1_R2_001.fastq"        
 [5] "MC_206_Wild_2_R1_001.fastq"         "MC_206_Wild_2_R2_001.fastq"        
 [7] "MC_206_Wild_3_R1_001.fastq"         "MC_206_Wild_3_R2_001.fastq"        
 [9] "silva_nr99_v138.1_train_set.fa"     "silva_species_assignment_v138.1.fa"
> # Forward and reverse fastq filenames have format: SAMPLENAME_R1_001.fastq and SAMPLENAME_R2_001.fastq
> fastqs <- fns[grepl(".fastq$", fns)]
> fastqs <- sort(fastqs) # Sort ensures forward/reverse reads are in same order
> fnFs <- fastqs[grepl("_R1", fastqs)] # Just the forward read files
> fnRs <- fastqs[grepl("_R2", fastqs)] # Just the reverse read files
> # Get sample names, assuming files named as so: SAMPLENAME_XXX.fastq
> sample.names <- sub("\\_R1_001.fastq*", "", fnFs)
> # Specify the full path to the fnFs and fnRs
> fnFs <- file.path(path, fnFs)
> fnRs <- file.path(path, fnRs)
> 
> #Visualize the quality profiles of the forward/reverse reads
> plotQualityProfile(fnFs[[1]])
Warning message:
`guides(<scale> = FALSE)` is deprecated. Please use `guides(<scale> = "none")` instead. 
> plotQualityProfile(fnFs[[2]])
Warning message:
`guides(<scale> = FALSE)` is deprecated. Please use `guides(<scale> = "none")` instead. 
> plotQualityProfile(fnRs[[1]])
Warning message:
`guides(<scale> = FALSE)` is deprecated. Please use `guides(<scale> = "none")` instead. 
> plotQualityProfile(fnRs[[2]])
Warning message:
`guides(<scale> = FALSE)` is deprecated. Please use `guides(<scale> = "none")` instead. 
> 
> # Make directory and filenames for the filtered fastqs
> filt_path <- file.path(path, "filtered")
> if(!file_test("-d", filt_path)) dir.create(filt_path)
> filtFs <- file.path(filt_path, paste0(sample.names, "_F_filt.fastq.gz"))
> filtRs <- file.path(filt_path, paste0(sample.names, "_R_filt.fastq.gz"))
> 
> #Filter the forward and reverse reads
> for(i in seq_along(fnFs)) {
+   fastqPairedFilter(c(fnFs[i], fnRs[i]), c(filtFs[i], filtRs[i]),
+                     truncLen=c(280,230), 
+                     maxN=0, maxEE=c(2,2), truncQ=2, rm.phix=TRUE,
+                     compress=TRUE, verbose=TRUE)
+ }
Read in 212089 paired-sequences, output 151245 (71.3%) filtered paired-sequences.
Read in 197084 paired-sequences, output 135227 (68.6%) filtered paired-sequences.
Read in 164514 paired-sequences, output 119152 (72.4%) filtered paired-sequences.
Read in 145366 paired-sequences, output 105911 (72.9%) filtered paired-sequences.
> 
> #Dereplicate the filtered fastq files
> derepFs <- derepFastq(filtFs, verbose=TRUE)
Dereplicating sequence entries in Fastq file: ~/Desktop/2019_Amplicon_Data/filtered/MC_206_T1A_1477_F_filt.fastq.gz
Encountered 28855 unique sequences from 151245 total sequences read.
Dereplicating sequence entries in Fastq file: ~/Desktop/2019_Amplicon_Data/filtered/MC_206_Wild_1_F_filt.fastq.gz
Encountered 27164 unique sequences from 135227 total sequences read.
Dereplicating sequence entries in Fastq file: ~/Desktop/2019_Amplicon_Data/filtered/MC_206_Wild_2_F_filt.fastq.gz
Encountered 18590 unique sequences from 119152 total sequences read.
Dereplicating sequence entries in Fastq file: ~/Desktop/2019_Amplicon_Data/filtered/MC_206_Wild_3_F_filt.fastq.gz
Encountered 20655 unique sequences from 105911 total sequences read.
> derepRs <- derepFastq(filtRs, verbose=TRUE)
Dereplicating sequence entries in Fastq file: ~/Desktop/2019_Amplicon_Data/filtered/MC_206_T1A_1477_R_filt.fastq.gz
Encountered 36001 unique sequences from 151245 total sequences read.
Dereplicating sequence entries in Fastq file: ~/Desktop/2019_Amplicon_Data/filtered/MC_206_Wild_1_R_filt.fastq.gz
Encountered 39423 unique sequences from 135227 total sequences read.
Dereplicating sequence entries in Fastq file: ~/Desktop/2019_Amplicon_Data/filtered/MC_206_Wild_2_R_filt.fastq.gz
Encountered 20997 unique sequences from 119152 total sequences read.
Dereplicating sequence entries in Fastq file: ~/Desktop/2019_Amplicon_Data/filtered/MC_206_Wild_3_R_filt.fastq.gz
Encountered 22536 unique sequences from 105911 total sequences read.
> # Name the derep-class objects by the sample names
> names(derepFs) <- sample.names
> names(derepRs) <- sample.names
> 
> #Learn the Error Rates
> dadaFs.lrn <- dada(derepFs, err=NULL, selfConsist = TRUE, multithread=TRUE)
Initializing error rates to maximum possible estimate.
selfConsist step 1 ....
   selfConsist step 2
   selfConsist step 3
   selfConsist step 4
   selfConsist step 5
   selfConsist step 6
   selfConsist step 7
   selfConsist step 8
   selfConsist step 9
   selfConsist step 10
Self-consistency loop terminated before convergence.
> errF <- dadaFs.lrn[[1]]$err_out
> dadaRs.lrn <- dada(derepRs, err=NULL, selfConsist = TRUE, multithread=TRUE)
Initializing error rates to maximum possible estimate.
selfConsist step 1 ....
   selfConsist step 2
   selfConsist step 3
   selfConsist step 4
   selfConsist step 5
   selfConsist step 6
   selfConsist step 7
   selfConsist step 8
   selfConsist step 9
   selfConsist step 10
Self-consistency loop terminated before convergence.
> errR <- dadaRs.lrn[[1]]$err_out
> plotErrors(dadaFs.lrn[[1]], nominalQ=TRUE)
Warning message:
Transformation introduced infinite values in continuous y-axis 
> 
> #Infer the sequence variants in each sample
> dadaFs <- dada(derepFs, err=errF, multithread=TRUE)
Sample 1 - 151245 reads in 28855 unique sequences.
Sample 2 - 135227 reads in 27164 unique sequences.
Sample 3 - 119152 reads in 18590 unique sequences.
Sample 4 - 105911 reads in 20655 unique sequences.
> dadaRs <- dada(derepRs, err=errR, multithread=TRUE)
Sample 1 - 151245 reads in 36001 unique sequences.
Sample 2 - 135227 reads in 39423 unique sequences.
Sample 3 - 119152 reads in 20997 unique sequences.
Sample 4 - 105911 reads in 22536 unique sequences.
> dadaFs[[1]]
dada-class: object describing DADA2 denoising results
1007 sequence variants were inferred from 28855 input unique sequences.
Key parameters: OMEGA_A = 1e-40, OMEGA_C = 1e-40, BAND_SIZE = 16
> 
> #Merge the denoised forward and reverse reads
> mergers <- mergePairs(dadaFs, derepFs, dadaRs, derepRs, verbose=TRUE)
143558 paired-reads (in 3427 unique pairings) successfully merged out of 147063 (in 4915 pairings) input.
131481 paired-reads (in 4370 unique pairings) successfully merged out of 133041 (in 4884 pairings) input.
117058 paired-reads (in 1649 unique pairings) successfully merged out of 118230 (in 2202 pairings) input.
102676 paired-reads (in 2083 unique pairings) successfully merged out of 104156 (in 2733 pairings) input.
> # Inspect the merger data.frame from the first sample
> head(mergers[[1]])
                                                                                                                                                                                                                                                                                                                                                                                                                                                    sequence
1 CCTACGGGGGGCTGCAGTGAGGAATTTTCCGCAATGGGCGAAAGCCTGACGGAGCAATGCCGCGTGGAGGATGAAAGCTTGTGAGTCGTAAACTCCTTTTCTTAGTGAAGAAATAAGACGGTATCTAAGGAATAAGCATCGGCTAACTCCGTGCCAGCAGCCGCGGTAATACGGAGGATGCAAGCGTTATTCGGAATTATTGGGCGTAAAGCGTCTGTAGGTGGTTTTTTAAGTCTACTGTTAAATATTAAGGCTTAACCTTAAAAAAGCGGTATGAAACTAAAAAACTTGAGTTTAGTAGAGGTAGAGGGAATTCTCGGTGTAGTGGTGAAATGCGTAGAGATCGAGAAGAACACCGGTAGCGAAAGCGCTCTACTGGGCTAAAACTGACACTGAGAGACGAAAGCTAGGGGAGCAAATAGGATTAGATACCCCAGTAGTC
2 CCTACGGGGGGCTGCAGTGAGGAATTTTCCGCAATGGGCGAAAGCCTGACGGAGCAATGCCGCGTGGAGGATGAAAGCTTGTGAGTCGTAAACTCCTTTTCTTAGTGAAGAAATAAGACGGTATCTAAGGAATAAGCATCGGCTAACTCCGTGCCAGCAGCCGCGGTAATACGGAGGATGCAAGCGTTATTCGGAATTATTGGGCGTAAAGCGTCTGTAGGTGGTTTTTTAAGTCTACTGTTAAATATTAAGGCTTAACCTTAAAAAAGCGGTATGAAACTAAAAAACTTGAGTTTAGTAGAGGTAGAGGGAATTCTCGGTGTAGTGGTGAAATGCGTAGAGATCGAGAAGAACACCGGTAGCGAAAGCGCTCTACTGGGCTAAAACTGACACTGAGAGACGAAAGCTAGGGGAGCAAATAGGATTAGATACCCCTGTAGTC
3 CCTACGGGGGGCTGCAGTGAGGAATTTTCCGCAATGGGCGAAAGCCTGACGGAGCAATGCCGCGTGGAGGATGAAAGCTTGTGAGTCGTAAACTCCTTTTCTTAGTGAAGAAATAAGACGGTATCTAAGGAATAAGCATCGGCTAACTCCGTGCCAGCAGCCGCGGTAATACGGAGGATGCAAGCGTTATTCGGAATTATTGGGCGTAAAGCGTCTGTAGGTGGTTTTTTAAGTCTACTGTTAAATATTAAGGCTTAACCTTAAAAAAGCGGTATGAAACTAAAAAACTTGAGTTTAGTAGAGGTAGAGGGAATTCTCGGTGTAGTGGTGAAATGCGTAGAGATCGAGAAGAACACCGGTAGCGAAAGCGCTCTACTGGGCTAAAACTGACACTGAGAGACGAAAGCTAGGGGAGCAAATAGGATTAGATACCCCGGTAGTC
4 CCTACGGGGGGCAGCAGTGAGGAATTTTCCGCAATGGGCGAAAGCCTGACGGAGCAATGCCGCGTGGAGGATGAAAGCTTGTGAGTCGTAAACTCCTTTTCTTAGTGAAGAAATAAGACGGTATCTAAGGAATAAGCATCGGCTAACTCCGTGCCAGCAGCCGCGGTAATACGGAGGATGCAAGCGTTATTCGGAATTATTGGGCGTAAAGCGTCTGTAGGTGGTTTTTTAAGTCTACTGTTAAATATTAAGGCTTAACCTTAAAAAAGCGGTATGAAACTAAAAAACTTGAGTTTAGTAGAGGTAGAGGGAATTCTCGGTGTAGTGGTGAAATGCGTAGAGATCGAGAAGAACACCGGTAGCGAAAGCGCTCTACTGGGCTAAAACTGACACTGAGAGACGAAAGCTAGGGGAGCAAATAGGATTAGATACCCCAGTAGTC
5 CCTACGGGGGGCAGCAGTGAGGAATTTTCCGCAATGGGCGAAAGCCTGACGGAGCAATGCCGCGTGGAGGATGAAAGCTTGTGAGTCGTAAACTCCTTTTCTTAGTGAAGAAATAAGACGGTATCTAAGGAATAAGCATCGGCTAACTCCGTGCCAGCAGCCGCGGTAATACGGAGGATGCAAGCGTTATTCGGAATTATTGGGCGTAAAGCGTCTGTAGGTGGTTTTTTAAGTCTACTGTTAAATATTAAGGCTTAACCTTAAAAAAGCGGTATGAAACTAAAAAACTTGAGTTTAGTAGAGGTAGAGGGAATTCTCGGTGTAGTGGTGAAATGCGTAGAGATCGAGAAGAACACCGGTAGCGAAAGCGCTCTACTGGGCTAAAACTGACACTGAGAGACGAAAGCTAGGGGAGCAAATAGGATTAGATACCCCTGTAGTC
6 CCTACGGGGGGCAGCAGTGAGGAATTTTCCGCAATGGGCGAAAGCCTGACGGAGCAATGCCGCGTGGAGGATGAAAGCTTGTGAGTCGTAAACTCCTTTTCTTAGTGAAGAAATAAGACGGTATCTAAGGAATAAGCATCGGCTAACTCCGTGCCAGCAGCCGCGGTAATACGGAGGATGCAAGCGTTATTCGGAATTATTGGGCGTAAAGCGTCTGTAGGTGGTTTTTTAAGTCTACTGTTAAATATTAAGGCTTAACCTTAAAAAAGCGGTATGAAACTAAAAAACTTGAGTTTAGTAGAGGTAGAGGGAATTCTCGGTGTAGTGGTGAAATGCGTAGAGATCGAGAAGAACACCGGTAGCGAAAGCGCTCTACTGGGCTAAAACTGACACTGAGAGACGAAAGCTAGGGGAGCAAATAGGATTAGATACCCCGGTAGTC
  abundance forward reverse nmatch nmismatch nindel prefer accept
1      3152       1       1     68         0      0      1   TRUE
2      2858       1       2     68         0      0      1   TRUE
3      2628       1       3     68         0      0      1   TRUE
4      2372       2       1     68         0      0      1   TRUE
5      2162       2       2     68         0      0      1   TRUE
6      1966       2       3     68         0      0      1   TRUE
> 
> #Construct sequence tables
> seqtab <- makeSequenceTable(mergers[names(mergers) != "Mock"])
> dim(seqtab)
[1]    4 9373
> # Inspect distribution of sequence lengths
> table(nchar(getSequences(seqtab)))

 280  282  331  405  418  423  428  439  440  441  442  443  444  445  446  447  451  452  453 
   1    1   29    9    1    4    5   49 3178  607  552  674  312   41    4    9    3    1    2 
 455  457  458  459  460  461  462  463  464  465  466  467 
  19   20   24   80  891   39   53    2  101 2434  219    9 
> 
> #Remove chimeric sequences
> seqtab.nochim <- removeBimeraDenovo(seqtab, verbose=TRUE)
Identified 7270 bimeras out of 9373 input sequences.
> dim(seqtab.nochim)
[1]    4 2103
> sum(seqtab.nochim)/sum(seqtab)
[1] 0.2629468
> 
> #Assign taxonomy
> taxa <- assignTaxonomy(seqtab.nochim, "~/Desktop/2019_Amplicon_Data/silva_nr99_v138.1_train_set.fa")
> unname(head(taxa))
     [,1]       [,2]            [,3]             [,4]           [,5]            [,6]     
[1,] "Bacteria" "Cyanobacteria" "Cyanobacteriia" "Chloroplast"  NA              NA       
[2,] "Bacteria" "Cyanobacteria" "Cyanobacteriia" "Chloroplast"  NA              NA       
[3,] "Bacteria" "Cyanobacteria" "Cyanobacteriia" "Chloroplast"  NA              NA       
[4,] "Bacteria" "Cyanobacteria" "Cyanobacteriia" "Chloroplast"  NA              NA       
[5,] "Bacteria" "Cyanobacteria" "Cyanobacteriia" "Chloroplast"  NA              NA       
[6,] "Bacteria" "Myxococcota"   "Myxococcia"     "Myxococcales" "Myxococcaceae" "P3OB-42"
> 
> #Evaluating DADA2’s accuracy on the mock community
> unqs.mock <- getUniques(removeBimeraDenovo(mergers[["Mock"]], verbose=TRUE))
Error in removeBimeraDenovo(mergers[["Mock"]], verbose = TRUE) : 
  Unrecognized format: Requires named integer vector, dada-class, derep-class, sequence matrix, or a data.frame with $sequence and $abundance columns.
> cat("DADA2 inferred", length(unqs.mock), "sample sequences present in the Mock community.\n")
Error in cat("DADA2 inferred", length(unqs.mock), "sample sequences present in the Mock community.\n") : 
  object 'unqs.mock' not found
> mockRef <- readFasta(file.path(path, "HMP_MOCK.v35.fasta"))
Error: Input/Output
  no input files found
  dirPath: /Users/amandawilliams/Desktop/2019_Amplicon_Data/HMP_MOCK.v35.fasta
  pattern: character(0)
> match.ref <- sum(sapply(names(unqs.mock), function(x) any(grepl(x, as.character(sread(mockRef))))))
Error in h(simpleError(msg, call)) : 
  error in evaluating the argument 'X' in selecting a method for function 'sapply': object 'unqs.mock' not found
> cat("Of those,", sum(match.ref), "were exact matches to the expected reference sequences.\n")
Error in cat("Of those,", sum(match.ref), "were exact matches to the expected reference sequences.\n") : 
  object 'match.ref' not found
benjjneb commented 2 years ago

You don't have a mock community sample in your data, so you need to just leave out the part of the tutorial code that evaluates the mock community sample present in the tutorial data.