Closed percyfal closed 2 years ago
Agree that this would be nice, I am assuming the conda environment organization reflects some kind of functional modularity (i.e. they are grouped depending on the analyses that are being run?)
The conda environment is meant to provide the minimum set of (conda) dependencies to run a rule. For some rules, not everything in a environment file is needed, but the differences are so small, and conda envs so costly to install, that the fewer conda env files, the better. Loading an extra environment module or two has little overhead, so we might as well provide the same grouping.
Right, do we also want to move envmodules.yaml
someplace else? I was just looking at the docs and it might make sense to move the section to the runtime config in .profile
though I don't know if I understood correctly how profiles are meant to be used...
At their simplest, profiles are a directory with a config.yaml which maps snakemake options as key:val entries. So a config file with
restart-times: 2
max-jobs-per-seconds: 1
translates to the snakemake command line
snakemake --restart-times 2 --max-jobs-per-seconds 1
Many of the options can be fine-tuned for specific environments, but you don't want to retype the everytime since most likely you don't want to change them anyway.
When it comes to submitting jobs to the cluster, there are three snakemake options that matter: --cluster
which is a command to submit jobs (shell script that wraps sbatch, python script ...), --cluster-status
that polls the slurm controller for job status, and --jobscript
which provides a custom jobscript for submission that actually wraps the snakemake command. You don't need these /per se/, but they do provide some additional level of control. The SnakemakeProfiles slurm cookiecutter template provides these scripts, along with templates to setup config.yaml and other files.
Our example config.yaml does not provide the custom scripts mentioned above; rather, I included them to show how to configure rule-specific resources (default-resources
, set-threads
etc).
Since it is likely that one would want to use the cookiecutter, I would on the one hand advise against putting envmodules in that directory as it is prone to be overwritten. OTOH they do fit together. For now I would suggest sticking with envmodules in config, and maybe add support for an environment variable such that it could be placed in e.g. .config/snakemake/envmodules.yaml or similar?
Yes, I think I had misunderstood how the profiles should be used..
For now I would suggest sticking with envmodules in config, and maybe add support for an environment variable such that it could be placed in e.g. .config/snakemake/envmodules.yaml or similar?
Do you mean a .config/
folder in the installation directory? That or any other "global" location that would be suitable (maybe workflow/envs/
?)
No, I meant the "regular" config directory would be the default location, if no other parameters have been set. What do we think will be the user-case scenario? You have an analysis folder (separate from the repo) in which there is a config directory with config.yaml and samples.tsv, and the envmodules.yaml file residing in some directory accessible to all analyses. This doesn't necessarily have to be in the repo. I'm open to any suggestions, but hopefully it will become clearer when we test multiple projects.
Following up on the environment variable discussion, this is what the help for snakemake --profile
flag says:
--profile PROFILE Name of profile to use for configuring Snakemake.
Snakemake will search for a corresponding folder in
/etc/xdg/xdg-lxqt/snakemake and
/home/peru/.config/snakemake. Alternatively, this can
be an absolute or relative path. The profile folder
has to contain a file 'config.yaml'. This file can be
used to set default values for command line options in
YAML format. For example, '--cluster qsub' becomes
'cluster: qsub' in the YAML file. Profiles can be
obtained from https://github.com/snakemake-profiles.
The profile can also be set via the environment
variable $SNAKEMAKE_PROFILE. [env var:
SNAKEMAKE_PROFILE] (default: None)
So there is already a use case where one puts config in a .config directory. I guess this could also be the current working directory, but would it be confusing to have both a .config
and a config
directory to keep track of?
Closed via #50
Currently, there is a separate envmodules key for every rule, but a given conda environment file is shared between rules. Since both keywords solve the same problem, envmodules should be shared between rules, following the conda environment sharing. For instance,
envs/malt.yaml
is shared between four rules, each of which has a separate envmodules config.