drieslab / Giotto

Spatial omics analysis toolbox
https://drieslab.github.io/Giotto_website/
Other
264 stars 98 forks source link

Simplify Giotto/Python Environment Installation #227

Open daccachejoe opened 2 years ago

daccachejoe commented 2 years ago

Hello-

Thanks for your work and effort on GIotto, I really like using this package but the installation of the python environment modules is super finicky and not that user-friendly. I am trying to run SpatialDWLS which requires the modules not automatically installed. I have repeatedly run installGiottoEnvironment with both force_environment and force_miniconda flagged as TRUE yet the errors don't get solved. 1- python.app==2 cannot be installed, so then I specified the installation of all packages except for pyhton.app, which ran successfully. Then after some processing, I try to run doLeidenCluster and the function fails with the following error and warnings that show that necessary packages were not installed in the Giotto environment.

Error in py_run_file_impl(file, local, convert)
ModuleNotFoundError: No module named 'igraph'
Detailed traceback
File "", line 9, in
File "/gpfs/ysm/home/jd2749/R/x86_64-pc-linux-gnu-library/4.1/reticulate/python/rpytools/loader.py", line 44, in _import_hook
level=level
Calls: PrepareGiottoObject ... doLeidenCluster -> -> py_run_file -> py_run_file_impl
In addition: Warning messages:
1: In createGiottoObject(raw_exprs = seruat_counts, :
module: pandas was not found with python path: /gpfs/ysm/home/jd2749/.local/share/r-miniconda/envs/giotto_env/bin/python
2: In createGiottoObject(raw_exprs = seruat_counts, : module: igraph was not found with python path: /gpfs/ysm/home/jd2749/.local/share/r-miniconda/envs/giotto_env/bin/python
3: In createGiottoObject(raw_exprs = seruat_counts : module: leidenalg was not found with python path: /gpfs/ysm/home/jd2749/.local/share/r-miniconda/envs/giotto_env/bin/python
4: In createGiottoObject(raw_exprs = seruat_counts, : module: community was not found with python path: /gpfs/ysm/home/jd2749/.local/share/r-miniconda/envs/giotto_env/bin/python
5: In createGiottoObject(raw_exprs = seruat_counts, : module: networkx was not found with python path: /gpfs/ysm/home/jd2749/.local/share/r-miniconda/envs/giotto_env/bin/python
6: In createGiottoObject(raw_exprs = seruat_counts, : module: sklearn was not found with python path: /gpfs/ysm/home/jd2749/.local/share/r-miniconda/envs/giotto_env/bin/python

I would really appreciate any and all guidance with this because the vignettes seem to indicate that simply running installGiottoEnvironment solves everything, but that isn't the case. There are modules not installed and maybe I can't navigate the vignettes well, but I can't find resources on the website on how to get around these errors.

Thanks!

RubD commented 2 years ago

Hi,

Can you specify which Giotto version you are using with packageVersion('Giotto') and did you install the giotto python environment in a fresh R version?

Thanks

daccachejoe commented 2 years ago

Hi,

> packageVersion("Giotto") [1] ‘1.1.0’ Yes, a fresh session. I've gotten these errors on local machines, login nodes, and compute nodes. Any chance it is due to the package version?

RubD commented 2 years ago

Could you run the following lines in a fresh R session? I'd like to figure out if the environment is installed properly or if there is potentially another issue.

load Giotto library(Giotto)

get the operating system Giotto:::get_os()

list different conda environments that can be found reticulate::conda_list()

specify to use the giotto environment reticulate::use_python(required = T, python = "/gpfs/ysm/home/jd2749/.local/share/r-miniconda/envs/giotto_env/bin/python")

check if the right environment is loaded reticulate::py_config()

check if modules can be found


reticulate::py_module_available('igraph')
reticulate::py_module_available('leidenalg')
reticulate::py_module_available('networkx')
reticulate::py_module_available('community')
reticulate::py_module_available('sklearn')```
daccachejoe commented 2 years ago

Hi Ruben,

Thanks for the guidance, I think the problem might lie in two giotto_environments in different locations. It is still weird why the necessary modules are installed though.

On a compute node: Giotto:::get_os()

sysname "linux"

4 python paths, 2 giotto environments reticulate::conda_list()

name 1 base 2 giotto_env 3 giotto_env 4 r-reticulate python 1 /gpfs/ysm/home/jd2749/.local/share/r-miniconda/bin/python 2 /gpfs/ysm/home/jd2749/.local/share/r-miniconda/envs/giotto_env/bin/python 3 /gpfs/ysm/project/bosenberg/jd2749/conda_envs/giotto_env/bin/python 4 /gpfs/ysm/project/bosenberg/jd2749/conda_envs/r-reticulate/bin/python

Trying out one of the giotto enviroments - no numpy but rest looks normal reticulate::use_python(required = T, python = "/gpfs/ysm/home/jd2749/.local/share/r-miniconda/envs/giotto_env/bin/python") reticulate::py_config()`

python: /gpfs/ysm/home/jd2749/.local/share/r-miniconda/envs/giotto_env/bin/python libpython: /gpfs/ysm/home/jd2749/.local/share/r-miniconda/envs/giotto_env/lib/libpython3.6m.so pythonhome: /gpfs/ysm/home/jd2749/.local/share/r-miniconda/envs/giotto_env:/gpfs/ysm/home/jd2749/.local/share/r-miniconda/envs/giotto_env version: 3.6.15 | packaged by conda-forge | (default, Dec 3 2021, 18:49:41) [GCC 9.4.0] numpy: [NOT FOUND]

NOTE: Python version was forced by use_python function

All modules were FALSE, I only included igraph for cleanliness. reticulate::py_module_available('igraph')

[1] FALSE

In a new sesison, tell reticulate to use the other giotto_environment. Still no numpy reticulate::use_python(required=T, python="/gpfs/ysm/project/bosenberg/jd2749/conda_envs/giotto_env/bin/python")
reticulate::py_config()

python: /gpfs/ysm/project/bosenberg/jd2749/conda_envs/giotto_env/bin/python libpython: /gpfs/ysm/project/bosenberg/jd2749/conda_envs/giotto_env/lib/libpython3.6m.so pythonhome: /gpfs/ysm/project/bosenberg/jd2749/conda_envs/giotto_env:/gpfs/ysm/project/bosenberg/jd2749/conda_envs/giotto_env version: 3.6.13 | packaged by conda-forge | (default, Feb 19 2021, 05:36:01) [GCC 9.3.0] numpy: [NOT FOUND]

NOTE: Python version was forced by use_python function

3/5 modules are available. reticulate::py_module_available('igraph')

[1] TRUE

reticulate::py_module_available('leidenalg')

[1] TRUE

reticulate::py_module_available('networkx')

[1] FALSE

reticulate::py_module_available('community')

[1] FALSE

reticulate::py_module_available('sklearn')

[1] FALSE

What should I do from here? Thanks for the help!

RubD commented 2 years ago

At this point I would probably:

  1. remove r-miniconda completely. You can use the terminal for that (e.g. rm -r path/to/r-miniconda but be careful).
  2. Also remove the giotto_env created within the conda_envs
  3. open fresh R and re-install Giotto and then re-install the giotto environment with installGiottoEnvironment()

At this point you could run the previous steps again and hopefully you see that all the modules can be found. The automatic giotto environment occurs within the r-miniconda folder, but maybe there is interference from the giotto environment within the conda_envs folder.

If this doesn't work, it might be a linux specific thing in which case we would need to do some further tests on a linux machine. It worked in the past, but maybe something changed.

daccachejoe commented 2 years ago

Hi Ruben,

I did as you specified and I think I found a potential cause. After removing r-miniconda and the giotto_env in conda_envs, the new createGiottoEnvironment creates r-miniconda in the same location as before but giotto_env gets created in conda_envs. I chalk this up to my SLURM CONDA_ENVS_PATH variable being pre-set on the cluster.
Next I tried manually creating a giotto_env conda environment in the r-miniconda/envs/ directory, but despite activating my environment, installation of packages would occur in my base conda/pkgs directory and not the environment specific directory. Thus, py_config and createGiottoObject call warnings about a number of modules not being installed in the expected r-miniconda/envs/giotto_env environment.
This is confusing to me as about 3 weeks ago I went through this process on a cluster node and it worked fine. Not sure what has changed since then.
I've since succumbed to running the pipeline on a local machine, but I am getting similar errors on a Windows Desktop, just going to have to commit to running this on a less powerful Mac laptop as that seems to be the only way I've managed to get it to work. I think there's some voodoo magic underway in reticulate that I don't have the experience to understand just yet.
Let me know if you are able to reproduce this error, I may have confused myself beyond belief.

RubD commented 2 years ago

I might have found the problem. After updating the createGiottoEnvironment function some weeks ago with the exact version numbers of the python module, to improve reproducibility, a part of the code broke. More specifically, on Windows and Linux it tries to install the pythonapp module, which is only needed for OSX and doesn't exist for Windows or Linux. I pushed a fix and increased version number to 1.1.1. I haven't tested it on my Windows machine yet, but hopefully everything works again. I also updated the checkGiottoEnvironment to check if everything is installed and found correctly.

daccachejoe commented 2 years ago

Hi Ruben I tried reinstalling Giotto and also tried using Giotto Suite but with no real success on the Linux/cluster end. It works well enough on a mac machine so for my purposes, it's all good for now!
Thanks for looking into this, I wish i had more to offer for anyone else who may run into this