benjjneb / LRASManuscript

Reproducible Analyses accompanying DADA2 + PacBio Manuscript
29 stars 11 forks source link

Error in primer removal error #6

Closed KishanMahmud closed 3 years ago

KishanMahmud commented 3 years ago

Hello Dr. Benjamin,

I am having an issue while removing primers. The code is used for a set of data and I am using the same code for another set of data of the same nature. I ran into similar issues some had earlier in 2019 on your page and the problem there was wrong primers. But here, for my case, the primers were not removed with other tools and the code worked for some data of the same nature.

setwd("/Users/kishanmahmud/Desktop/Soil Microbiome Data/Non Toxic Endo")

path1 <- "ntcbind" path2 <- "ntcbind" path.out <- "Figures/" path.rds <- "RDS/" fns1 <- list.files(path1, pattern="fastq.gz", full.names=TRUE) fns2 <- list.files(path2, pattern="fastq.gz", full.names=TRUE) F27 <- "AGRGTTYGATYMTGGCTCAG" R1492 <- "RGYTACCTTGTTACGACTT" rc <- dada2:::rc theme_set(theme_bw()) nops2 <- file.path(path2, "noprimers", basename(fns1)) prim2 <- removePrimers(fns1, nops2, primer.fwd=F27, primer.rev=dada2:::rc(R1492), orient=TRUE)

It is giving me a nopimers folder with fastq.gz files but also giving me this message. "Error in sapply(match.fwd, end) + 1 : non-numeric argument to binary operator"

Your help is requested. Thanks.

Best Kishan

benjjneb commented 3 years ago

After loading dada2, can you post the output of sessionInfo() (will give all the relevant R/package versions).

Also, why are you mixing together path1/path2, and fns.1/fns.2 etc, when they all seem identical? This could lead to hard to diagnose bugs, e.g. in your nops2 <- ... and prim2 <- ... calls that have both 1 and 2 version variables.

KishanMahmud commented 3 years ago

library(dada2) Loading required package: Rcpp sessionInfo() R version 4.0.2 (2020-06-22) Platform: x86_64-apple-darwin17.0 (64-bit) Running under: macOS Catalina 10.15.7

Matrix products: default BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib

locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages: [1] stats graphics grDevices utils datasets methods base

other attached packages: [1] dada2_1.16.0 Rcpp_1.0.5

loaded via a namespace (and not attached): [1] plyr_1.8.6 RColorBrewer_1.1-2 pillar_1.4.6
[4] compiler_4.0.2 GenomeInfoDb_1.24.2 XVector_0.28.0
[7] bitops_1.0-6 tools_4.0.2 zlibbioc_1.34.0
[10] lifecycle_0.2.0 tibble_3.0.4 gtable_0.3.0
[13] lattice_0.20-41 png_0.1-7 pkgconfig_2.0.3
[16] rlang_0.4.8 Matrix_1.2-18 DelayedArray_0.14.1
[19] rstudioapi_0.11 parallel_4.0.2 GenomeInfoDbData_1.2.3
[22] stringr_1.4.0 hwriter_1.3.2 dplyr_1.0.2
[25] generics_0.0.2 Biostrings_2.56.0 vctrs_0.3.4
[28] S4Vectors_0.26.1 IRanges_2.22.2 stats4_4.0.2
[31] grid_4.0.2 tidyselect_1.1.0 Biobase_2.48.0
[34] glue_1.4.2 R6_2.5.0 jpeg_0.1-8.1
[37] BiocParallel_1.22.0 latticeExtra_0.6-29 reshape2_1.4.4
[40] ggplot2_3.3.2 purrr_0.3.4 magrittr_1.5
[43] Rsamtools_2.4.0 matrixStats_0.57.0 GenomicAlignments_1.24.0
[46] scales_1.1.1 ellipsis_0.3.1 BiocGenerics_0.34.0
[49] GenomicRanges_1.40.0 SummarizedExperiment_1.18.2 ShortRead_1.46.0
[52] colorspace_1.4-1 stringi_1.5.3 RCurl_1.98-1.2
[55] RcppParallel_5.0.2 munsell_0.5.0 crayon_1.3.4

KishanMahmud commented 3 years ago

Hello Dr. Benjamin,

I figured it out. One of the files in the data set was corrupted. Thank you for your patience and reply.

Best Kishan

janetw commented 1 year ago

@KishanMahmud I am having a similar error. The code works fine for several samples but then breaks when it reaches one of the samples. How did you figure out that one of the files in your data set was corrupted. Thank you for your time. Best, Janet