Imageomics / char-sim

Pipeline to create model for comparing character state descriptions including ontology similarity
MIT License
0 stars 1 forks source link

Added a simple rule for executing pytorch in conda environment. (fix #3) #5

Open KSoumya opened 3 months ago

KSoumya commented 3 months ago

Fixes #3.

balhoff commented 3 months ago

Also, put something like "fixes #3" in the PR description.

KSoumya commented 3 months ago

updated PR description

balhoff commented 3 months ago

I just realized this is a new Snakefile in a subfolder. Why not add to our existing file?

balhoff commented 3 months ago

I tried running and get this error:

Traceback (most recent call last):
  File "/home/balhoff/test-rule/char-sim/snakemake_conda/.snakemake/scripts/tmplbi3nw0_.sample_script.py", line 5, in <module>
    import torch
ModuleNotFoundError: No module named 'torch'

I don't see torch in the env.yaml; should it be there?

KSoumya commented 3 months ago

I just realized this is a new Snakefile in a subfolder. Why not add to our existing file?

the subfolder is now removed and the existing Snakefile is updated with a new rule.

balhoff commented 3 months ago

@KSoumya thanks for the updates; I am trying it out.

balhoff commented 3 months ago

@KSoumya when I run I get this error:

Traceback (most recent call last):
  File "/home/balhoff/test-rule/char-sim/.snakemake/scripts/tmp1a2ojqu_.create_train_data.py", line 8, in <module>
    import pandas as pd
ModuleNotFoundError: No module named 'pandas'

I see that pandas is in environment.yaml, but under 'pip' rather than directly in 'dependencies'. What is the difference?

hlapp commented 3 months ago

I see that pandas is in environment.yaml, but under 'pip' rather than directly in 'dependencies'. What is the difference?

The difference is only for how it gets installed (via pip from PyPi or via conda via a conda channel). The error suggests that you either didn't create the conda environment or that it isn't activated for the particular step that requires it.

balhoff commented 3 months ago

@KSoumya @hlapp now that I actually have conda installed, this is working for me (I installed miniforge and mamba). But I needed to edit environment.yaml. I initially got some conflicts which seemed to be between the version of snakemake I have (presumably one of the newest) and a very old version of python (3.8.19) that is specified in environment.yaml.

My intuition (without being familiar with snakemake/conda practices) would be that the environment should be specified in the most minimal way possible. But I'm not sure if this file is supposed to act as a statement of the direct dependencies or instead like a lock file. But as written it didn't work for me; maybe it would have if I had a specific version of conda or snakemake?

@KSoumya based on the shell snippet you sent me, I think your background environment may have more installed into it, rather than setting up the environment in the rule:

snakemake --cores 4 --use-singularity id12_desc12_simGIC.tsv.gz

I needed to use --use-conda so that the environment was created when the rule was run:

snakemake -c4 --show-failed-logs --use-singularity --use-conda id12_desc12_simGIC.tsv.gz

Maybe this is why you didn't run into these issues in your own runs.

KSoumya commented 3 months ago

@KSoumya @hlapp now that I actually have conda installed, this is working for me (I installed miniforge and mamba). But I needed to edit environment.yaml. I initially got some conflicts which seemed to be between the version of snakemake I have (presumably one of the newest) and a very old version of python (3.8.19) that is specified in environment.yaml.

My intuition (without being familiar with snakemake/conda practices) would be that the environment should be specified in the most minimal way possible. But I'm not sure if this file is supposed to act as a statement of the direct dependencies or instead like a lock file. But as written it didn't work for me; maybe it would have if I had a specific version of conda or snakemake?

@KSoumya based on the shell snippet you sent me, I think your background environment may have more installed into it, rather than setting up the environment in the rule:

snakemake --cores 4 --use-singularity id12_desc12_simGIC.tsv.gz

I needed to use --use-conda so that the environment was created when the rule was run:

snakemake -c4 --show-failed-logs --use-singularity --use-conda id12_desc12_simGIC.tsv.gz

Maybe this is why you didn't run into these issues in your own runs.

@balhoff your snakemake command does entirely make sense, indeed --use-conda needs to be enabled. I will check how to make the environment.yaml more geeneric.

balhoff commented 3 months ago

@KSoumya I also forgot to say—in the snakemake docs it says that without that flag, the conda environment property in a rule is entirely ignored.

KSoumya commented 3 months ago

@KSoumya I also forgot to say—in the snakemake docs it says that without that flag, the conda environment property in a rule is entirely ignored.

that's right, since I have the env defined and activated during the runs I didn't come across this requirement. Thanks for sharing this.

hlapp commented 3 months ago

My intuition (without being familiar with snakemake/conda practices) would be that the environment should be specified in the most minimal way possible. But I'm not sure if this file is supposed to act as a statement of the direct dependencies or instead like a lock file.

It can work as either. In the form exported using conda env export all versions are "locked". This is often desirable, as installing a later version for some dependency when run at a later time not only will result in a different environment, but can (and in practice often does) break code that's not forward compatible.

I do agree that Python 3.8.x is relatively old at this point, and that Python shouldn't need to be held at this version. That is, unless, I think, we're using Tensorflow 1. But thought we're using Torch, and 3.11 is generally supported by recent versions of TensorFlow, Torch, etc.