NBISweden / nbis-meta

A snakemake workflow for metagenomic projects
https://nbis-metagenomic-workflow.readthedocs.io/
MIT License
13 stars 9 forks source link

Creating an nbis-meta environment for HPC #25

Closed lisalenorelowe closed 1 year ago

lisalenorelowe commented 2 years ago

Dear John Sundh and Co.,

I am trying to get nbis-meta up and running on our HPC, initially from a request from Ryan P., but I would like to set it up so anyone can use it.

I created a module (it is a conda environment, but it is set so people can do 'module load') by doing: conda env create --prefix (/path/to/env) -f environment.yml I also 'conda installed' some other packages to that environment, namely Jupyter and others to try the ipyb that were included in the git repo.

When I tried to run things (and I am not bioinformatics and generally have no idea what this does...also I have never used snakemake), I realized that during execution, it downloads a bunch more Conda packages. Our users cannot do that from compute nodes. Can I install a comprehensive set of Conda packages so that a user of nbis-meta would never have to do that? Or do you need to install custom environments each time, because the environments break because of the different versions?

If a user would have to install different stuff all the time, then should I direct them to git clone, do the initial 'conda env create' from environment.yml, etc., themselves, or is it still worth trying to make that available for people to module load (works the same as conda activate), then only conda add the extra stuff to their run directories?

Also, I understand you can break up the steps such that a user can: Step 1) Do all the 'conda env create' and anything else that has to access the internet, including wget or ftp Step 2) Run it. Can you send or point to some instructions for that?

Thanks for your help, Lisa

johnne commented 2 years ago

Hi @lisalenorelowe and sorry for the late reply.

You're right that the environment.yml file does not contain all the packages required to run all steps of the workflow. That's why the first steps taken by the workflow is to install additional packages. The rationale is that a user should need to install as few packages as possible, basically only the ones necessary to get the results of interest. Also, the explicit versions set for packages often cause dependency conflicts with each other just as you say. That's another reason we try to keep the environments small and specific.

If I understand you correctly, users on your system cannot install conda nor create environments with conda? Does that also apply to users home folder? On the HPC that we typically use users can install conda to their home folder (on the login node), then use conda to install environments in project locations they have access to. That's sufficient to use nbis-meta, but maybe your system has more restrictions.

If you as an admin can run conda then you are right that you should be able to break up the install and run steps. You could try to run with a config file where everything is set to True, then run:

snakemake --use-conda --conda-create-envs-only --conda-prefix <path>

where you exchange <path> with some location on disk where conda environments may be stored. Once all conda packages have been created, users should be able to run the workflow by also including --use-conda --conda-prefix <path> in their snakemake call.

Let me know if that helps and just ask if you have more questions.

lisalenorelowe commented 2 years ago

Hi John, Thanks so much for your response.

We have Conda installed - with absolutely no additional packages - so that users can easily do 'conda create' without installing Conda. I had thought to install a Conda environment for nbis-meta just to make it easier for our users, but now I understand that from the way nbis-meta works, it is better that they do that themselves.

I was trying a test and trying to accomplish something before replying...but I was not successful in getting the example to work. It took a horribly long time, and then never finished installing packages. It quit at some point with an error about the packages being corrupted - in the pkgs_dir, but that might be because I killed and restarted a couple times.

I wonder, do you have maybe a tar package that contains the bare minimum of what I would need to see if things work on our system? (Like including all yml, input, and sample fastq files, etc? Maybe a short tutorial?) How long would you say it usually takes to set up an nbis-meta environment?

Here is what I did to try to run it. I have these instructions: https://github.com/NBISweden/nbis-meta/wiki/How-to-run-the-workflow

Prepare a test directory: mkdir TEST cp -r workflow/* TEST cp -r config TEST cd TEST cp config/config.yaml .

Change the location of the temporary directory in config.yaml:

set to $TMPDIR, /scratch or equivalent when running on HPC clusters

temp: "$TMPDIR"

This is in the original instructions (make sure when you paste, it pastes dash dash rather than hyphen): snakemake --use-conda --configfile config.yaml --cores 8

I typed that command and it started running, but John's email suggests that using mamba is faster, so I cancelled and did this: snakemake --use-conda --conda-frontend mamba --configfile config.yaml --cores 8

And since we want to try the following command, I killed it again…(seems like it will take a long time, so better to just test what I actually want to do): snakemake --use-conda --conda-frontend mamba --configfile config.yaml --cores 8 --conda-create-envs-only

On Thu, Oct 21, 2021 at 5:09 PM John Sundh @.***> wrote:

Hi @lisalenorelowe https://github.com/lisalenorelowe and sorry for the late reply.

You're right that the environment.yml file does not contain all the packages required to run all steps of the workflow. That's why the first steps taken by the workflow is to install additional packages. The rationale is that a user should need to install as few packages as possible, basically only the ones necessary to get the results of interest. Also, the explicit versions set for packages often cause dependency conflicts with each other just as you say. That's another reason we try to keep the environments small and specific.

If I understand you correctly, users on your system cannot install conda nor create environments with conda? Does that also apply to users home folder? On the HPC that we typically use users can install conda to their home folder (on the login node), then use conda to install environments in project locations they have access to. That's sufficient to use nbis-meta, but maybe your system has more restrictions.

If you as an admin can run conda then you are right that you should be able to break up the install and run steps. You could try to run with a config file where everything is set to True, then run:

snakemake --use-conda --conda-create-envs-only --conda-prefix

where you exchange with some location on disk where conda environments may be stored. Once all conda packages have been created, users should be able to run the workflow by also including --use-conda --conda-prefix in their snakemake call.

Let me know if that helps and just ask if you have more questions.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/NBISweden/nbis-meta/issues/25#issuecomment-949005028, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFO26S3LQWFOCMBEKZ3AIYTUIB6QXANCNFSM5FPULQXQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

--

Lisa L. Lowe, PhD Resume: https://sites.google.com/site/lisallowephd Friends of Former Yugoslavia: http://www.meetup.com/Cary-Serbo-Croatian-Language-Meetup

johnne commented 2 years ago

Installing environments with conda can take a terribly long time. Previously the nbis-meta environment contained a lot more packages which made it more cumbersome to install. So now most packages have been moved to separate conda environment files so that users will only have to get through the installation steps of packages they will actually need. But yes it can still take quite a long time. As you mention it can be possible to speed things up by using mamba´instead ofconda. You can use it both when running the pipeline (--conda-frontend mamba) but also when installing the base nbis-meta environment (mamba env create -f environment.yml`).

I just tried to clone the workflow into a new location on the HPC we typically use. I ran:

git clone git@github.com:NBISweden/nbis-meta.git
cd nbis-meta/

Then set up a folder for keeping the conda environment in the same directory and installed with mamba:

mkdir envs/
mamba env create -f environment.yml -p envs/nbis-meta

That installation took roughly 10 minutes. I then activated the environment with conda activate envs/nbis-meta.

Once you've done that you can run the workflow using test data by simply doing:

snakemake -j 4 --use-conda --conda-frontend mamba qc

That will download test data and run the preprocessing part (read trimming and QC check) on that test data. The command took about 6 minutes to run on our HPC login node.

I recommend running the workflow from the same directory that you cloned into from GitHub, so don't copy the workflow folders into another directory. Simply

  1. git clone and change into the nbis-meta directory
  2. create and activate environment
  3. run the workflow
lisalenorelowe commented 2 years ago

Hi John,

Thank you for the reply. It seems I was complicating things before because the instructions you gave worked fine just now.

Thanks again! Lisa

On Thu, Nov 4, 2021 at 2:55 AM John Sundh @.***> wrote:

Installing environments with conda can take a terribly long time. Previously the nbis-meta environment contained a lot more packages which made it more cumbersome to install. So now most packages have been moved to separate conda environment files so that users will only have to get through the installation steps of packages they will actually need. But yes it can still take quite a long time. As you mention it can be possible to speed things up by using mamba´instead of conda. You can use it both when running the pipeline ( --conda-frontend mamba) but also when installing the base nbis-meta environment (mamba env create -f environment.yml`).

I just tried to clone the workflow into a new location on the HPC we typically use. I ran:

git clone @.***:NBISweden/nbis-meta.git cd nbis-meta/

Then set up a folder for keeping the conda environment in the same directory and installed with mamba:

mkdir envs/

mamba env create -f environment.yml -p envs/nbis-meta

That installation took roughly 10 minutes. I then activated the environment with conda activate envs/nbis-meta.

Once you've done that you can run the workflow using test data by simply doing:

snakemake -j 4 --use-conda --conda-frontend mamba qc

That will download test data and run the preprocessing part (read trimming and QC check) on that test data. The command took about 6 minutes to run on our HPC login node.

I recommend running the workflow from the same directory that you cloned into from GitHub, so don't copy the workflow folders into another directory. Simply

  1. git clone and change into the nbis-meta directory
  2. create and activate environment
  3. run the workflow

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/NBISweden/nbis-meta/issues/25#issuecomment-960501368, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFO26S6DLSSRD3NISP2I53TUKIU4XANCNFSM5FPULQXQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

--

Lisa L. Lowe, PhD Resume: https://sites.google.com/site/lisallowephd Friends of Former Yugoslavia: http://www.meetup.com/Cary-Serbo-Croatian-Language-Meetup