Closed slambrechts closed 3 years ago
Hey Sam,
Good question! Yes, Snakemake will figure out which rules need to be run based on the presence/absence of output/input files based on the target rule. It will not re-run the jobs for already assembled samples, unless you delete/move the output file, or if you use the Snakeamke -R
flag.
For example, lets say my target rule is assembly, so you would run something like e.g. bash metaGEM.sh -t megahit -j 43 -c 32
. Let's look at the megahit
rule in the Snakefile:
First of all, Snakemake will make sure that the inputs for this rule are present. So if the samples have not been quality filtered, then Snakemake would submit 43 qfilter jobs + 43 assembly jobs.
Now let's say that the samples have all been quality filtered in a previous run, then Snakemake will check if the output of the target rule is present, i.e. it will search the assemblies/
subfolders for files called contigs.fasta.gz
. In your scenario you said you had 10 assemblies completed, so if they are present in the specified location, then Snakemake would only submit 33 assembly jobs.
Some useful troubleshooting tips:
metaGEM.sh
script, as it will always dry-run jobs before asking you if they look good for submission.snakemake all -n
in your metaGEM
folder.touch
to create dummy output files, then dryrun to see if you tricked Snakemake into thinking that the files have already been generated. Remember to delete the dummy files afterwards!Best, Francisco
Hi Francisco,
Thank you for your answer. Copying the assemblies/
subfolders did not work. metaGEM recognized the samples that were assembled on the same cluster, but not the others that were assembled on the other cluster and copied over. Maybe I should also copy files for the intermediate results folder?
Also, it seems that when metaGEM then starts a task for a sample that was previously run on a different machine, the result folder that is already present (the one I copied over) for that sample gets deleted.
Best, Sam
Hey Sam,
Did metaGEM
try to submit quality filtering jobs for the samples who's assemblies got deleted? Did you have all the qfiltered/
result files (including the ones for the samples that were assembled on your local machine) on your new cluster? Similarly, does you dataset/
folder contain all you samples? You need to have these files present, otherwise Snakemake will try to re-create them before running your target rule.
Hi francisco,
No metaGEM
did not try to submit qc jobs for those samples. All the samples were qfiltered
on both machines, and the dataset
folder contained all the samples. As a work around, I temporarily moved the samples that were already assembled from the dataset
folder and restarted.
I see, sorry to hear you were having trouble with this, but glad that you figured out the workaround!
Let's say I have 43 samples, and on one local machine metaGEM finished the assemblies of 10 of them, but now I would like to continue on a faster cluster for the remaining assembly tasks. My question is if I copy these to the
assemblies
folder of the other local machine, if metaGEM will recognize these and not assemble these samples again?