benjjneb / dada2

Accurate sample inference from amplicon data with single nucleotide resolution
http://benjjneb.github.io/dada2/
GNU Lesser General Public License v3.0
469 stars 142 forks source link

Running into filterAndTrim error mclapply(seq_len(n)) depending on maxEE value #607

Closed RJ333 closed 5 years ago

RJ333 commented 5 years ago

Hello,

I'm currently running my data (2 x 300 reads, V3-V4, cDNA) based on the workflows described here https://f1000research.com/articles/5-1492/v2 and here https://usda-ars-gbru.github.io/Microbiome-workshop/tutorials/amplicon/ on a server with 28 cores and 230 Gb.

The above workflow worked so far with my own data, now I'm testing the filterAndTrim function including multithreading.

This worked fine with the suggested values for maxEE. But since my data is of not so good quality (V3-V4), I started playing around with the maxEE values and when I go above 3 for the reverse reads, the process crashes. (see below)

I would like to mention that memory seems to be no problem and I'm not getting any message about colnames or something which is mentioned in the other issues (#285, #263, #333). I attach my commands, the output and the session info below.

Thank you

> out <- filterAndTrim(fnFs, filtFs, fnRs, filtRs, truncLen = c(280, 260),
+                   maxN = 0, maxEE = c(2, 2), truncQ = 2, rm.phix = FALSE,
+                   trimLeft = 5, compress = TRUE, multithread = TRUE)
> head(out)
                                  reads.in reads.out
cut_c100_S80_L001_R1_001.fastq.gz   118614     28186
cut_c101_S81_L001_R1_001.fastq.gz   148092     39922
cut_c102_S82_L001_R1_001.fastq.gz   138489     41386
cut_c103_S83_L001_R1_001.fastq.gz   513906    156849
cut_c104_S84_L001_R1_001.fastq.gz   111734     29238
cut_c105_S85_L001_R1_001.fastq.gz   128089     37926

> out <- filterAndTrim(fnFs, filtFs, fnRs, filtRs, truncLen = c(280, 260),
+                   maxN = 0, maxEE = c(2, 3), truncQ = 2, rm.phix = FALSE,
+                   trimLeft = 5, compress = TRUE, multithread = TRUE)
> head(out)
                                  reads.in reads.out
cut_c100_S80_L001_R1_001.fastq.gz   118614     49137
cut_c101_S81_L001_R1_001.fastq.gz   148092     65746
cut_c102_S82_L001_R1_001.fastq.gz   138489     65297
cut_c103_S83_L001_R1_001.fastq.gz   513906    257018
cut_c104_S84_L001_R1_001.fastq.gz   111734     47621
cut_c105_S85_L001_R1_001.fastq.gz   128089     63067

> out <- filterAndTrim(fnFs, filtFs, fnRs, filtRs, truncLen = c(280, 260),
+                   maxN = 0, maxEE = c(2.5, 3), truncQ = 2, rm.phix = FALSE,
+                   trimLeft = 5, compress = TRUE, multithread = TRUE)
> head(out)
                                  reads.in reads.out
cut_c100_S80_L001_R1_001.fastq.gz   118614     50509
cut_c101_S81_L001_R1_001.fastq.gz   148092     67629
cut_c102_S82_L001_R1_001.fastq.gz   138489     67205
cut_c103_S83_L001_R1_001.fastq.gz   513906    264636
cut_c104_S84_L001_R1_001.fastq.gz   111734     51356
cut_c105_S85_L001_R1_001.fastq.gz   128089     64898

> out <- filterAndTrim(fnFs, filtFs, fnRs, filtRs, truncLen = c(280, 260),
+                   maxN = 0, maxEE = c(2.5, 4), truncQ = 2, rm.phix = FALSE,
+                   trimLeft = 5, compress = TRUE, multithread = TRUE)
Error in filterAndTrim(fnFs, filtFs, fnRs, filtRs, truncLen = c(280, 260),  :
  These are the errors (up to 5) encountered in individual cores...
Error in writeFastq(fqF, fout[[1]], "w", compress = compress) :
  failed to write record 243
Error in writeFastq(fqF, fout[[1]], "w", compress = compress) :
  failed to write record 26486
Error in writeFastq(fqF, fout[[1]], "w", compress = compress) :
  failed to write record 243
Error in writeFastq(fqR, fout[[2]], "a", compress = compress) :
  failed to write record 19226
Error in writeFastq(fqF, fout[[1]], "w", compress = compress) :
  failed to write record 230
In addition: Warning message:
In mclapply(seq_len(n), do_one, mc.preschedule = mc.preschedule,  :
  scheduled cores 12, 7, 13, 15, 10, 4, 2, 14, 3, 1, 8, 9, 5, 6 encountered errors in user code, all values of the jobs will be affected

> head(out)
                                  reads.in reads.out
cut_c100_S80_L001_R1_001.fastq.gz   118614     50509
cut_c101_S81_L001_R1_001.fastq.gz   148092     67629
cut_c102_S82_L001_R1_001.fastq.gz   138489     67205
cut_c103_S83_L001_R1_001.fastq.gz   513906    264636
cut_c104_S84_L001_R1_001.fastq.gz   111734     51356
cut_c105_S85_L001_R1_001.fastq.gz   128089     64898
> out <- filterAndTrim(fnFs, filtFs, fnRs, filtRs, truncLen = c(280, 260),
+                   maxN = 0, maxEE = c(2.5, 5), truncQ = 2, rm.phix = FALSE,
+                   trimLeft = 5, compress = TRUE, multithread = TRUE)
Error in filterAndTrim(fnFs, filtFs, fnRs, filtRs, truncLen = c(280, 260),  :
  These are the errors (up to 5) encountered in individual cores...
Error in writeFastq(fqF, fout[[1]], "a", compress = compress) :
  failed to write record 6571
Error in writeFastq(fqF, fout[[1]], "w", compress = compress) :
  failed to write record 230
Error in writeFastq(fqR, fout[[2]], "a", compress = compress) :
  failed to write record 5246
Error in writeFastq(fqF, fout[[1]], "a", compress = compress) :
  failed to write record 256
Error in writeFastq(fqR, fout[[2]], "w", compress = compress) :
  failed to write record 50286
In addition: Warning message:
In mclapply(seq_len(n), do_one, mc.preschedule = mc.preschedule,  :
  scheduled cores 6, 23, 5, 1, 25, 17, 16, 3, 9, 21, 12, 22, 20, 11, 24, 8, 28, 13, 7, 4, 10, 14, 2, 15 encountered errors in user code, all values of the jobs will be affected

  > sessionInfo()
R version 3.5.1 (2018-07-02)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS: /home/centos/miniconda3/envs/dada2/lib/R/lib/libRblas.so
LAPACK: /home/centos/miniconda3/envs/dada2/lib/R/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets
[8] methods   base

other attached packages:
 [1] phangorn_2.4.0      ape_5.2             DECIPHER_2.10.0
 [4] RSQLite_2.1.1       Biostrings_2.50.0   XVector_0.22.0
 [7] IRanges_2.16.0      S4Vectors_0.20.0    BiocGenerics_0.28.0
[10] phyloseq_1.26.0     dada2_1.10.0        Rcpp_0.12.19
[13] gridExtra_2.3       ggplot2_3.1.0

loaded via a namespace (and not attached):
 [1] Biobase_2.42.0              bit64_0.9-7
 [3] jsonlite_1.5                splines_3.5.1
 [5] foreach_1.4.4               RcppParallel_4.4.1
 [7] BiocManager_1.30.3          latticeExtra_0.6-28
 [9] blob_1.1.1                  GenomeInfoDbData_1.2.0
[11] Rsamtools_1.34.0            pillar_1.3.0
[13] lattice_0.20-38             quadprog_1.5-5
[15] digest_0.6.18               GenomicRanges_1.34.0
[17] RColorBrewer_1.1-2          colorspace_1.3-2
[19] Matrix_1.2-15               plyr_1.8.4
[21] pkgconfig_2.0.2             ShortRead_1.40.0
[23] zlibbioc_1.28.0             scales_1.0.0
[25] BiocParallel_1.16.0         tibble_1.4.2
[27] mgcv_1.8-25                 withr_2.1.2
[29] SummarizedExperiment_1.12.0 lazyeval_0.2.1
[31] survival_2.43-1             magrittr_1.5
[33] crayon_1.3.4                memoise_1.1.0
[35] nlme_3.1-137                MASS_7.3-51.1
[37] hwriter_1.3.2               vegan_2.5-3
[39] tools_3.5.1                 data.table_1.11.8
[41] matrixStats_0.54.0          stringr_1.3.1
[43] Rhdf5lib_1.4.0              munsell_0.5.0
[45] cluster_2.0.7-1             DelayedArray_0.8.0
[47] ade4_1.7-13                 compiler_3.5.1
[49] GenomeInfoDb_1.18.0         rlang_0.3.0.1
[51] rhdf5_2.26.0                grid_3.5.1
[53] RCurl_1.95-4.11             iterators_1.0.10
[55] biomformat_1.10.0           igraph_1.2.2
[57] bitops_1.0-6                tcltk_3.5.1
[59] gtable_0.2.0                codetools_0.2-15
[61] multtest_2.38.0             DBI_1.0.0
[63] reshape2_1.4.3              GenomicAlignments_1.18.0
[65] bit_1.1-14                  fastmatch_1.1-0
[67] permute_0.9-4               stringi_1.2.4
dswan commented 5 years ago

Same error previously spotted on qiime2 forums, check whether your tmpdir actually has space:

https://forum.qiime2.org/t/error-while-running-dada2-in-r-return-code-1/4570

RJ333 commented 5 years ago

thanks for the hint! It was not missing space within the temp dir, but in the working directory. Another analysis I was running on the same system (not R-related) that produced so much output that my hdd was almost completely filled.