BiocParallel error - Githubissues

benjjneb / dada2

Accurate sample inference from amplicon data with single nucleotide resolution

http://benjjneb.github.io/dada2/

GNU Lesser General Public License v3.0

466 stars 142 forks source link

BiocParallel error #1940

Closed ThibauldMichel closed 4 months ago

ThibauldMichel commented 5 months ago

Environment specifications: R version 4.3.2 (2023-10-31) -- "Eye Holes" Linux OS: "Rocky Linux 8.9 (Green Obsidian)" packageVersion dada2 1.30.0

Hello, I am processing sequencing reads downloaded from Short Reads Archives. I saw this error message pop up just after cutadapt report. I have made some inquiries to know if it was due to Cutadapt itself, and apparently it is not this tool that trigger the problem. https://github.com/marcelm/cutadapt/issues/779#event-12572759126

Would you have seen this error before?

WARNING:
    One or more of your adapter sequences may be incomplete.
    Please see the detailed output above.
Stop worker failed with the error: reached elapsed time limit
Error: BiocParallel errors
  0 remote errors, element index: 
  156 unevaluated and other errors
  first remote error:
Execution halted

benjjneb commented 5 months ago

What command in R is yielding the Error?

ThibauldMichel commented 5 months ago

Just after cutadapt, the ShortRead::qa() commands seem to trigger the error. I have run the script in an interactive session rather than in a batch job, and got the following error:

> out_1 <- cbind(ShortRead::qa(fas_Fs_raw)[["readCounts"]][,"read", drop = FALSE],
               ShortRead::qa(fas_Fs_cut)[["readCounts"]][,"read", drop = FALSE])
Error in reducer$value.cache[[as.character(idx)]] <- values : 
  wrong args for environment subassignment
In addition: Warning messages:
1: In parallel::mccollect(wait = TRUE) :
  6 parallel jobs did not deliver results
2: In parallel::mccollect(wait = FALSE, timeout = 1) :
  7 parallel jobs did not deliver results

benjjneb commented 4 months ago

To clarify: the initial command throwing an error was filterAndTrim? and the ShortRead::qa calls above are from trying to track down the inner logic? Or is your script is using ShortRead::qa directly?

If the first -- have you tried running filterAndTrim(..., multithread=FALSE)?

ThibauldMichel commented 4 months ago

Apologizes for the delay in the answer.

No, the error occur before filterAndTrim().

I have used cutadapt to trim the primers, then the command below to track the number of reads that have been trimmed.

The commandfilterAndTrim() is used afterward and does not run in this case, as the pipeline give an error message before.

cutadapt <- "cutadapt" # Path to the executable
for(i in seq_along(fas_Fs_raw)) {
  cat("Processing", "-----------", i, "/", length(fas_Fs_raw), "-----------\n")
  system2(cutadapt, args = c(R1_flags, R2_flags,
                             "--discard-untrimmed",
                             "--max-n 0",
                             # Optional strong constraint on expected length
                             #paste0("-m ", 250-nchar(FWD)[1], ":", 250-nchar(REV)[1]), 
                             #paste0("-M ", 250-nchar(FWD)[1], ":", 250-nchar(REV)[1]), 
                             "-o", fas_Fs_cut[i], "-p", fas_Rs_cut[i],
                             fas_Fs_raw[i], fas_Rs_raw[i]))
}

out_1 <- cbind(ShortRead::qa(fas_Fs_raw)[["readCounts"]][,"read", drop = FALSE],
               ShortRead::qa(fas_Fs_cut)[["readCounts"]][,"read", drop = FALSE])

benjjneb commented 4 months ago

So I guess the error is coming from the ShortRead::qa function calls.

As a workaround I would try using a simpler function, like ShortRead::countFastq to get the number of reads in the file. You could also try using the dada2 function getSequences and then take the length of the returned character vector.