carpentries-incubator / snakemake-novice-bioinformatics

Introduction to Snakemake for Bioinformatics
https://carpentries-incubator.github.io/snakemake-novice-bioinformatics
Other
15 stars 9 forks source link

Introducing conda in the setup page #50

Open tbooth opened 9 months ago

tbooth commented 9 months ago

from @tkphd

Conda is a common and useful tool, but it is simply invoked, not introduced. Explain what it is (a Python distribution with virtual environment isolation), how it helps (simplifies dependency management), and how to use it.

The instructions as written appear to update an existing environment, not create a new one. The environment is named "snakemake_dash". Why? The conda_env.yaml file contains a whole lot of specific packages. Consider filtering this to specify just those packages you would install manually: snakemake, fastqc, kallisto, etc. Let conda fill in the full dependency graph.

tbooth commented 9 months ago

Conda is a common and useful tool, but it is simply invoked, not introduced. Explain what it is (a Python distribution with virtual environment isolation), how it helps (simplifies dependency management), and how to use it.

I'm not sure the setup page is the place for a Conda tutorial, but I've linked to Ep 10 which has this info.

tbooth commented 9 months ago

The instructions as written appear to update an existing environment, not create a new one.

I've emphasised that conda env update really does create the environment. I agree that many people would assume this command would only update an existing environment. Conda has many quirks.

tbooth commented 9 months ago

The environment is named "snakemake_dash". Why?

Good point. A relic of where the material originated. I've renamed it to `snakemake_capentry' and modified/removed some other references to the DaSH project.

tbooth commented 9 months ago

The conda_env.yaml file contains a whole lot of specific packages. Consider filtering this to specify just those packages you would install manually: snakemake, fastqc, kallisto, etc. Let conda fill in the full dependency graph.

I did this and made files/conda_env_min.yaml, but I believe that, on Linux at least, using the full manifest is more likely to work and get the desired environment. My experience of Conda is it tends to assume that upgrades of dependent packages are always compatible, but in fact breaking changes are common. The tbb dependency mentioned in ep 10 is a case in point.