CSCfi / csc-user-guide

User guides, FAQs and tutorials related to CSC services
https://docs.csc.fi
Creative Commons Attribution 4.0 International
45 stars 77 forks source link

Snakemake #2091

Closed yetulaxman closed 1 month ago

yetulaxman commented 2 months ago

Proposed changes

Briefly describe the changes you've made here. Remember to add a link to the preview page of your branch.

Checklist before requesting a review

samumantha commented 2 months ago

FYI @ktiits

samumantha commented 2 months ago

Line 87 in the tutorial mentions env.yml, which is not further specified. Should there also be the link to the same file that you linked from the apps page?

yetulaxman commented 2 months ago

Line 87 in the tutorial mentions env.yml, which is not further specified. Should there also be the link to the same file that you linked from the apps page?

The env.yml file is available as part of tar file for tutorials. In reality, this env file varies from person to person and is not part of snakemake app. So I think it is ok to skip the link to env file on app page

samumantha commented 2 months ago

Could that then specifically be mentioned there? ie that an example env.yml is provided with the data and scripts in the tar?

yetulaxman commented 2 months ago

Could that then specifically be mentioned there? ie that an example env.yml is provided with the data and scripts in the tar?

Sure. we can mention it.

ktiits commented 2 months ago

Yesterday we shortly discussed with Antoni, but he was unsure, how it goes. So below 2 open questions that could be explained in tutorial:

  1. When the provided snakemake module can be used and when own installation is needed? My guess is that the module can not/should not be used together with any other module containing Python? To be sure no conflicts are created by 2 different Pythons.. If that is the case, then I would see that it could be listed and explained the 3 Python packages needed in Tykky installations also without downloading some bigger exercises tar-file.

  2. When to use the local runner, when with SLURM-runner and when with HQ? (I must admit I have not read the tutorial to the end, so it might be there already.) We wondered yesterday if with HQ could support use of multiple nodes and local runner alone may-be could not do that. But we did not know. The SLURM-integration Antoni said, that we do not want to recommend because of the many SLURM jobs created..

yetulaxman commented 2 months ago

Yesterday we shortly discussed with Antoni, but he was unsure, how it goes. So below 2 open questions that could be explained in tutorial:

1. When the provided snakemake module can be used and when own installation is needed? My guess is that the module can not/should not be used together with any other module containing Python? To be sure no conflicts are created by 2 different Pythons..
   If that is the case, then I would see that it could be listed and explained the 3 Python packages needed in Tykky installations also without downloading some bigger exercises tar-file.

2. When to use the local runner, when with SLURM-runner and when with HQ? (I must admit I have not read the tutorial to the end, so it might be there already.) We wondered yesterday if with HQ could support use of multiple nodes and local runner alone may-be could not do that. But we did not know. The SLURM-integration Antoni said, that we do not want to recommend because of the many SLURM jobs created..

Thanks for the comments - both are valid points. Yes. we can explain (actually we did in this version) what is in our snakemake module in CSC docs so that users can choose to install snakemake as part of tykky-based installations or other. Snakemake module installed at system level (/appl/soft/..) is a plain (non-containerised) installation so that it is compatible as much as possible with other custom installations including tykky-based ones. Order of activating or loading these environments matters in avoiding python conflicts. It is an issue mainly if snakefile uses python scripts directly under script blocks. All application installations for data analysis as part of workflows have to come from either tykky-based installation or existing modules or singularity and Apptainers containers. The bigger tar file is due to container image and we can avoid it by uploading to a registry. All the recipe for custom installations are part of materials. I will work on reducing the size of tutorial content in the next iterations.

I think we don't have any hard bounderies on using a particular executors. If there are too many smaller job steps (or rules) (roughly > 1000 jobs, < 30 minutes), better to use hq. If there are only few hundreds of rules each takes considerable time for execution, one might still use native slurm executor (and prepare to stay in queue for each job step). Local executor can be used on interactive nodes and also when snakemake workflow is submitted as one batch job. HQ scales well (basically you can spin up many workers if needed to submit more jobs) across multiple nodes. I did not perform that many scaling tests on snakemake workflows though. But experience with nextflow is that hq seems to work fine across multiple nodes as well.

samumantha commented 1 month ago

Sorry, I currently do not have time to review this. From my perspective: One approval is good enough to merge it, to have it visible for users. Smaller things, like Links to Antonis work, can still be updated later.