Closed matrs closed 3 years ago
Hi Jose,
Indeed that should fix this problem in your situation! I suspect that part of the problem also stems from the fact that your scratch/
path in the config.yaml
file is likely pointing to a single directory, is this correct?
In the clusters I have used to develop metaGEM
there is generally a variable called something like $TMPDIR
or $SCRATCH
, which has a job-specific directory for each sample when submitting jobs (e.g. this), meaning that you can use the same variable in the Snakefile and then each job will be given a unique storage location by the scheduler/cluster.
Does your cluster have such a variable? If so, then you can set your scratch/
path in the config.yaml
file as shown below to avoid having to modify other rules that make use of the scratch/
directory.
scratch: $YOUR_CLUSTER_TMPDIR
Thanks for reporting, I will update the documentation to elaborate on the usage of the scratch
path.
Best wishes, Francisco
Hello Francisco,
I didn't know that when submitting jobs to a $SCRATCH
partition a unique directory for each job is created, that explains why nobody complained about this in the past. In this particular cluster there is no scratch partition, so no $SCRATCH is defined. The /tmp
directory works as in any linux system and also is rather small so I defined tmp
to be a directory in my $HOME
in the json config (in this cluster, /home
is a local file system).
Thank you for your help.
Jose luis
Yes, unfortunately it can be a bit difficult to build readily usable/deployable pipelines when clusters tend to be quite idiosyncratic.
I am slightly concerned in your situation: when you submit jobs in parallel further downstream in the analysis (e.g. see Snakefile rule crossMap
) then you will have multiple jobs trying to use the same directory and this will cause errors. At the moment I see 3 potential solutions:
scratch/
directory). shell
section of the Snakefile jobs that move files into the scratch
dir.I will implement solution 1 in the Snakefile as soon as I get the chance. This would fix the problem for users that dont have a job specific $SCRATCH
or $TMPDIR
variable, while also not causing problems for users that do have that job specific var.
Thank you very much, I'll check the next steps in the pipeline the following days and I'll implement one of the solutions you suggested.
Thanks!
Hello, thanks for this pipeline, it's been very useful. I found an error when running the
rule megahit
with two jobs in parallel . In the line 232 of theSnakefile
:that
-o tmp
makesmegahit
to complain and stop because that file/folder already exist. I solved the problem defining an output name depending on the sample name: