dbeisser / Natrix2

Open-source bioinformatics pipeline for the preprocessing of raw amplicon sequencing / metabarcoding data.
MIT License
10 stars 2 forks source link

Medaka taking suspiciously too long? #17

Closed omarkr8 closed 2 months ago

omarkr8 commented 3 months ago

Hi,

unsure if this is a bug. but I believe some assembly step in medaka is taking abit longer than usual. the readouts look like : "00:48:50 - Sampler] Took 0.00s to make features. [00:49:08 - PWorker] 99.7% Done (0.0/0.0 Mbases) in 17.6s [00:49:08 - PWorker] All done, 0 remainder regions. [00:49:08 - DLoader] Initializing data loader [00:49:08 - PWorker] Running inference for 0.0M draft bases. [00:49:08 - Sampler] Initializing sampler for consensus of region rp2rp1;c4622567-d9d3-424d-86e8-68cd905f8d20:0-335. [00:49:08 - Feature] Processed rp2rp1;c4622567-d9d3-424d-86e8-68cd905f8d20:0.0-334.0 (median depth 90.0) [00:49:08 - Sampler] Took 0.09s to make features."

whichever step this is, feels like its not using the resources allocated. were there any changes recently that might have caused this? Even though is says " takes 0.09s to make features" as an example, i get an new readout every 10-20 seconds. surely its meant to be much quicker?

DanaBlu commented 3 months ago

Hi @omarkr8,

How long is the amplicon you are trying to process with the Natrix2 pipeline? Medaka is a tool designed for long reads, and the algorithm sometimes struggles with shorter reads. I have already noticed the slow speed. Currently, we have not been able to optimize the Medaka algorithm to make it faster (for both, long and short reads), so I recommend having some patience with Medaka. We are continuing to work on updating the Medaka rule to improve its speed.

Best, Dana

omarkr8 commented 3 months ago

Yes that may be the case. my amplicons are 210bp after primers are cut. Sounds like it's working normally then.