Open trawler-crawler opened 3 weeks ago
learnErrors
is running the core DADA2 denoising algorithm repeatedly. Increasing numbers of unique sequences, increasing true diversity in a sample, and increasing sequence length all increase computation time. learnErrors
can be the most computationally intensive step in the DADA2 workflow, and long read data is more challenging computationally, so what you're seeing is not out of the realm of expectations. That said, given your number of unique sequences, I would expect this to be computationally tractable in ~days on a laptop.
A couple things you can do to speed things up a a bit:
derepFastq
step and run learnErrors
on the filtered fastq files instead (they will then be loaded into memory one-at-a-time on the fly).CPU consumption for R process is consistently around 30-35%
This seems low to me. Is this 30-35% of all available processors? Or 30-35% of a single processor?
Hi,
I'm trying to use dada2 to analyse PacBio 16S full amplicon data. I only have four samples which I filtered with these settings:
The resulting fastq.gz files are 22 to 23 mb large.
Dereplication shows:
I realise I have a lot of unique sequences.
Next, I want to learn the error rates but it is extremely slow. My output so far:
LearnErrors() has been running for about 24 hours now and is still going. Is this normal? Can it be sped up somehow?
My computer should be sufficient to calculate. I have 32 GB of RAM and a CPU with 8 cores, 16 logical processors and base speed 4.20 Ghz. CPU consumption for R process is consistently around 30-35%
Any help with this would be very much appreciated. Thank you.