franciscozorrilla / metaGEM

:gem: An easy-to-use workflow for generating context specific genome-scale metabolic models and predicting metabolic interactions within microbial communities directly from metagenomic data
https://franciscozorrilla.github.io/metaGEM/
MIT License
189 stars 41 forks source link

Continue or restart failed or incomplete tasks #59

Closed slambrechts closed 3 years ago

slambrechts commented 3 years ago

Let's say I have 43 samples, and on one local machine metaGEM finished the assemblies of 10 of them, but now I would like to continue on a faster cluster for the remaining assembly tasks. My question is if I copy these to the assemblies folder of the other local machine, if metaGEM will recognize these and not assemble these samples again?

franciscozorrilla commented 3 years ago

Hey Sam,

Good question! Yes, Snakemake will figure out which rules need to be run based on the presence/absence of output/input files based on the target rule. It will not re-run the jobs for already assembled samples, unless you delete/move the output file, or if you use the Snakeamke -R flag.

For example, lets say my target rule is assembly, so you would run something like e.g. bash metaGEM.sh -t megahit -j 43 -c 32. Let's look at the megahit rule in the Snakefile:

https://github.com/franciscozorrilla/metaGEM/blob/d81186a0700f974b4f57db587b71b960a951db83/Snakefile#L273-L278

First of all, Snakemake will make sure that the inputs for this rule are present. So if the samples have not been quality filtered, then Snakemake would submit 43 qfilter jobs + 43 assembly jobs.

Now let's say that the samples have all been quality filtered in a previous run, then Snakemake will check if the output of the target rule is present, i.e. it will search the assemblies/ subfolders for files called contigs.fasta.gz. In your scenario you said you had 10 assemblies completed, so if they are present in the specified location, then Snakemake would only submit 33 assembly jobs.

Some useful troubleshooting tips:

Best, Francisco

slambrechts commented 3 years ago

Hi Francisco,

Thank you for your answer. Copying the assemblies/ subfolders did not work. metaGEM recognized the samples that were assembled on the same cluster, but not the others that were assembled on the other cluster and copied over. Maybe I should also copy files for the intermediate results folder?

Also, it seems that when metaGEM then starts a task for a sample that was previously run on a different machine, the result folder that is already present (the one I copied over) for that sample gets deleted.

Best, Sam

franciscozorrilla commented 3 years ago

Hey Sam,

Did metaGEM try to submit quality filtering jobs for the samples who's assemblies got deleted? Did you have all the qfiltered/ result files (including the ones for the samples that were assembled on your local machine) on your new cluster? Similarly, does you dataset/ folder contain all you samples? You need to have these files present, otherwise Snakemake will try to re-create them before running your target rule.

slambrechts commented 3 years ago

Hi francisco,

No metaGEMdid not try to submit qc jobs for those samples. All the samples were qfilteredon both machines, and the datasetfolder contained all the samples. As a work around, I temporarily moved the samples that were already assembled from the datasetfolder and restarted.

franciscozorrilla commented 3 years ago

I see, sorry to hear you were having trouble with this, but glad that you figured out the workaround!