Closed vmikk closed 3 years ago
Hi Vladimir - thanks for your question. I am not quite sure I understand the situation yet.... let me recapitulate what I understood: you run dadasnake, giving it 8 threads at the step where it's supposed to make the ASVs, you would expect to run 8 samples at any time, but it only runs 1 sample? to be honest, if that's true, I don't really know why it's happening. Just a few checks, to make sure we're on the same page: you have the latest dadasnake version? and you haven't limited the number of threads in the config file or the VARIABLE_CONFIG? and you have the samples and runs defined in the sample table as described, so there are approx 1000 samples and each belongs to one of the 10 runs? If all that is correct, then my guess would be that there is some other rule running in addition to the 1 sample. Do you think you could run a test with 8 samples and post here the output it prints? In any case, the changes you've made to the memory resources will not have any effect on the execution of the rules, these settings are only used to pass to a scheduler. Best wishes - Anna
Hello Anna! Thank you for the fast response! Yes, you understood me correctly.
I'm using the latest dadasnake (git commit 8405a39)
I've limited the number of threads in VARIABLE_CONFIG
to 12
:
SNAKEMAKE_VIA_CONDA true
LOADING_MODULES
SUBMIT_COMMAND
SCHEDULER uge
MAX_THREADS 12
BIGMEM_CORES 12
BIGMEM_MEM_EACH 30G
NORMAL_MEM_EACH 8G
LOCK_SETTINGS true
In htop
I see that there is only one active process running (dada_dadaReads.single.R
).
However there are the other sleeping Snakemake processes (green on the picture) which remain after their descendant dada_dadaReads.single.R
is finished.
I haven't noticed them at first because they are at the end of the htop list (no CPU activity).
The most puzzling is that the previous steps (e.g., filtering) worked perfectly in parallel.
So it's probably not a dadasnake issue, but something related to Snakemake. I will try to figure out what's going on.
With kind regards, Vladimir
Dear Vladimir - yes, it looks like a snakemake problem. Please let me know if you find something. Best wishes - Anna
Hello Anna!
It seems that this is an old and unsolved problem of Snakemake (e.g., mentioned on StackOverflow here). The reason is probably that Snakemake checks the successfully completed jobs before continuing to the next batch of jobs. And in the case when there are a lot of tasks to be done, this phase could be quite slow.
I've tried also with the updated version of Snakemake v.5.30.1 (in the changelog they mentioned that the scheduler was improved) - but the problem remains.
With kind regards, Vladimir
Cool, thanks for checking this out, Vladimir!
Hello!
We have 10 sequencing runs, each with ~100 samples. We run Dadasnake on a desktop (in
-l
mode). Dadasnake works pretty well at the first stages (filtering and error estimation), however when it comes to thedada_dadaSingle
rule, it switches to the sequential analysis of samples (not in parallel). If we terminate and resume the workflow, Snakemake starts 8 processes at first, but after they finish it proceeds in single-thread mode only (one sample at time). However, it should be enough of resources to proceed with all 8 cores.The command we are using:
with
so the main sub-workflow is
dada.single.smk
.I've tried to remove the
resources
section in the rules, and to decreaseNORMAL_MEM_EACH
to 3G inVARIABLE_CONFIG
. But it does not help. Could you please tell us where the problem could be?With kind regards, Vladimir