Open KatharinaHoff opened 6 months ago
Minimal example to do cd in snakemake
Snakefile:
# Snakefile
rule my_rule:
input:
# Input files or wildcards
output:
# Output files
shell:
"""
./run_task.sh ..
"""
run_task.sh:
#!/bin/bash
echo $PWD
# Change to the desired directory
cd "$1"
echo $PWD
snakemake -s Snakefile my_rule --cores 1
This works for me. We can probably wrap the RepeatModeler/RepeatMasker commands in such a bash script. The same applies to VARUS, but we won't implement that now. That would be rather a task for @StepanSaenko if he decides to build on this codebase.
@StepanSaenko The varus container is available here: https://hub.docker.com/repository/docker/katharinahoff/varus-notebook/general You need to be careful because of the chdir problem, but here it is outlines how to get around it.
RepeatModeler/RepeatMasker are extremely heavy on the file system i/o. We will likely get a complaint from the HPC admin to run it as it is implemented for now.
The problem is that these tools require to be inside the directory where they write a lot of (temporary) files.
Outside of snakemake, we usually either go to /tmp/{USER}/rm , copy the genome file there, and then execute from there. On snowball and batch, there is also the option to do the same in /dev/shm/rm, which is way faster. Both options keep the traffic on the node, do not harm the entire cluster i/o volume.
My snakemake workflow died when I tried to cd in the snakemake shell. But there may be other options to cd to the directory, i.e. call a bash script from the snakemake shell, or call a python script from the snakemake shell, that performs the cd and the launching of the tools.
@claraptzsl please test whether any of these options work on a minimal toy example. If we can get changing to the execution directory to work, we can fix it here in the repeat masking rule, and that would make things a lot better.