GfellerLab / MetacellAnalysisToolkit

Toolkit for metacell analysis
13 stars 2 forks source link

No module named 'anndata' error #8

Open gianfilippo opened 6 months ago

gianfilippo commented 6 months ago

Hi,

I tried both the conda and docker (using singularity) versions I run the following MATK -t SuperCell -i data/cd34_multiome_rna.h5ad -o MATK_output/SuperCell/cd34/ -n 50 -f 2000 -k 30 -g 75 -s seurat and singularity run --bind $(pwd) matk_v1.0.sif MATK -t SuperCell -i data/cd34_multiome_rna.h5ad -o MATK_output/SuperCell/cd34/ -n 50 -f 2000 -k 30 -g 75 -s seurat

and I get the error below. Can you please help ?

Thanks

Error in py_module_import(module, convert = convert) : ModuleNotFoundError: No module named 'anndata' Run reticulate::py_last_error() for details. Calls: -> -> py_module_import Execution halted

aurelieGabriel commented 6 months ago

Dear Gianfilippo,

Thank you for your interest in MetacellAnalysisToolkit.

Using the following command lines, we obtained no error:

singularity pull docker://agabriel/matk:v1.0
singularity run --bind $(pwd) matk_v1.0.sif MATK -t SuperCell -i data/cd34_multiome_rna.h5ad -o MATK_output/SuperCell/cd34/ -n 50 -f 2000 -k 30 -g 75 -s seurat

We suspect that we are not using the same file as input. Unfortunately, the link to download the cd34_multiome_rna.h5ad initially provided in our README seems to be corrupted at the moment. We apologize for the inconvenience and have updated the link. Could you please download again the data and check that you have the following md5: 4cd8d82adfe267f54e13d8a383918fd0 by running: md5sum data/cd34_multiome_rna.h5ad.

We can also run a test on another dataset. After cloning/pulling the current MetacellAnalysisToolkit repository you can try the following command lines:

Finally, for future usage, please note that matk:v1.0 is based on Seurat V4 and matk:v1.1 on Seurat V5. The command lines described above should run with both docker environments.

Best wishes.

gianfilippo commented 6 months ago

Hi,

thanks, but I still get the same error. I think both the conda version and the Docker version look into my local Python path.

What do you suggest ?

Thanks

aurelieGabriel commented 6 months ago

Hi,

Thanks for the feedback, I agree that it could be the case considering the error message. I am surprised though that this happens in the docker container.

Could you try to identify which python is used, running: singularity run --bind $(pwd) matk_v1.0.sif which python In my case, I obtain the following: /opt/conda/envs/MetacellAnalysisToolkit/bin/python

I think that singularity has a strange behaviour and mounts also the HOME directory, can you provide the output of the following command: singularity run --bind $(pwd) matk_v1.0.sif Rscript -e "reticulate::py_config()"

and then do the same adding the --no-home option: singularity run --no-home --bind $(pwd) matk_v1.0.sif Rscript -e "reticulate::py_config()"

I suspect that adding --no-home could solve your issue.

Note that I had to update the docker containers, please pull again the containers using singularity before running your tests.

Let me know if this helps and I will update the README accordingly.

Best wishes, Aurélie

gianfilippo commented 6 months ago

Hi,

thanks. I am puzzled as well.

Anyway, 1) singularity run --bind $(pwd) matk_v1.0.sif which python /opt/conda/envs/MetacellAnalysisToolkit/bin/python

2) singularity run --bind $(pwd) matk_v1.0.sif Rscript -e "reticulate::py_config()" python: /home/XXX/.conda/envs/r-reticulate/bin/python libpython: /home/XXX/.conda/envs/r-reticulate/lib/libpython3.10.so pythonhome: /home/XXX/.conda/envs/r-reticulate:/home/XXX/.conda/envs/r-reticulate version: 3.10.14 | packaged by conda-forge | (main, Mar 20 2024, 12:45:18) [GCC 12.3.0] numpy: [NOT FOUND]

3) singularity run --no-home --bind $(pwd) matk_v1.0.sif Rscript -e "reticulate::py_config()" python: /home/XXX/.conda/envs/r-reticulate/bin/python libpython: /home/XXX/.conda/envs/r-reticulate/lib/libpython3.10.so pythonhome: /home/XXX/.conda/envs/r-reticulate:/home/XXX/.conda/envs/r-reticulate version: 3.10.14 | packaged by conda-forge | (main, Mar 20 2024, 12:45:18) [GCC 12.3.0] numpy: [NOT FOUND]

I should also mention that the singularity run command looks into my R_LIBS_USER even if I add the "--no-home" flag and exits with a different error code early in the process Error: package or namespace load failed for ‘Seurat’ in dyn.load(file, DLLpath = DLLpath, ...):

If I unset R_LIBS_USER, then I am back to the error I reported, also using the "--no-home" flag.

The python path seems to be correct, so I do not understand why i am getting the error.

What do you think ?

Best

aurelieGabriel commented 6 months ago

Hi,

Sorry for the delay, I was unavailable during the past week.

Could you please provide the output of the following command:

singularity run --no-home --cleanenv --bind $(pwd) matk_v1.0.sif Rscript -e "reticulate::py_config(); .libPaths()"

And then:

singularity run --no-home --cleanenv --env R_LIBS_USER=/opt/conda/envs/MetacellAnalysisToolkit/lib/R/library --bind $(pwd) matk_v1.0.sif Rscript -e "reticulate::py_config(); .libPaths()"

Additionally, could you let me know the version of Singularity you are using? I would like to try to reproduce the error.

Finally, something that could help us debug would be to check the environment variables:

singularity exec --no-home --cleanenv --env R_LIBS_USER=/opt/conda/envs/MetacellAnalysisToolkit/lib/R/library --bind $(pwd) matk_v1.0.sif env

Best,

Aurélie

gianfilippo commented 6 months ago

Hi,

thanks for looking into this! The output of the first command: python: /opt/conda/envs/MetacellAnalysisToolkit/bin/python3 libpython: /opt/conda/envs/MetacellAnalysisToolkit/lib/libpython3.9.so pythonhome: /opt/conda/envs/MetacellAnalysisToolkit:/opt/conda/envs/MetacellAnalysisToolkit version: 3.9.16 | packaged by conda-forge | (main, Feb 1 2023, 21:39:03) [GCC 11.3.0] numpy: /opt/conda/envs/MetacellAnalysisToolkit/lib/python3.9/site-packages/numpy numpy_version: 1.24.4

NOTE: Python version was forced by PATH

python versions found: /opt/conda/envs/MetacellAnalysisToolkit/bin/python3 /opt/conda/envs/MetacellAnalysisToolkit/bin/python [1] "/opt/conda/envs/MetacellAnalysisToolkit/lib/R/library"

I am really using apptainer 1.2.5-1.el8

The output from the second command: APPTAINER_APPNAME= APPTAINER_BIND=/home/$USERID/scripts/MetacellAnalysisToolkit APPTAINER_COMMAND=exec APPTAINER_CONTAINER=/home/$USERID/scripts/MetacellAnalysisToolkit/matk_v1.0.sif APPTAINER_ENVIRONMENT=/.singularity.d/env/91-environment.sh APPTAINER_NAME=matk_v1.0.sif HOME=/home/$USERID LANG=C.UTF-8 LC_ALL=C.UTF-8 LD_LIBRARY_PATH=/.singularity.d/libs PATH=/opt/conda/envs/MetacellAnalysisToolkit/bin:/opt/conda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/MetacellAnalysisToolkit/cli/ PROMPT_COMMAND=PS1="Apptainer> "; unset PROMPT_COMMAND PS1=Apptainer> PWD=/gpfs/ycga/pi/coppola/SamKatz/scripts/MetacellAnalysisToolkit R_LIBS_USER=/opt/conda/envs/MetacellAnalysisToolkit/lib/R/library SINGULARITY_BIND=/home/$USERID/scripts/MetacellAnalysisToolkit SINGULARITY_CONTAINER=/home/$USERID/scripts/MetacellAnalysisToolkit/matk_v1.0.sif SINGULARITY_ENVIRONMENT=/.singularity.d/env/91-environment.sh SINGULARITY_NAME=matk_v1.0.sif TERM=xterm-256color

Best Gianfilippo

aurelieGabriel commented 6 months ago

Hello,

Based on these outputs, to me it seems that the paths inside the container are correct with --cleanenv, what is your error running the matk command including the cleanenv option?

singularity run --no-home --cleanenv --bind $(pwd) matk_v1.0.sif MATK -t SuperCell -i data/cd34_multiome_rna.h5ad -o MATK_output/SuperCell/cd34/ -n 50 -f 2000 -k 30 -g 75 -s seurat

or

singularity run --no-home --cleanenv --env R_LIBS_USER=/opt/conda/envs/MetacellAnalysisToolkit/lib/R/library --bind $(pwd) matk_v1.0.sif MATK -t SuperCell -i data/cd34_multiome_rna.h5ad -o MATK_output/SuperCell/cd34/ -n 50 -f 2000 -k 30 -g 75 -s seurat

I had some issues installing your version of apptainer, I will come back to you when it is solved. Best, Aurélie

gianfilippo commented 5 months ago

Hi,

sorry about the delay.

I just tried the last two commands and the runs completed without errors.

I did not change anything on the cluster or my account settings. So I do not know why things are working now. I will try with my data and see.

Thanks

ksoleary commented 5 months ago

Just want to add for people (not sure if this could be causing any of the problems in this thread) that Seurat files should be v3/4 and file name should end in .rds not .RDS (case sensitive, sometimes people use all caps for file suffix). Great tool! It's working really well for me.

gianfilippo commented 5 months ago

Hi, I tried running on my data and on the example data. SuperCell seems to work, but if I try SEACells or MetaCell, I get the following error File "$HOME/bin/MetacellAnalysisToolkit/cli/MetaCell2CL.py", line 312, in main(sys.argv[1:]) File "$HOME/bin/MetacellAnalysisToolkit/cli/MetaCell2CL.py", line 175, in main ro.r(f'sobj <- readRDS("{input_file}")') File "$HOME/.conda/envs/MetacellAnalysisToolkit/lib/python3.9/site-packages/rpy2/robjects/init.py", line 459, in call res = self.eval(p) File "$HOME/.conda/envs/MetacellAnalysisToolkit/lib/python3.9/site-packages/rpy2/robjects/functions.py", line 208, in call return (super(SignatureTranslatedFunction, self) File "$HOME/.conda/envs/MetacellAnalysisToolkit/lib/python3.9/site-packages/rpy2/robjects/functions.py", line 131, in call res = super(Function, self).call(*new_args, *new_kwargs) File "$HOME/.conda/envs/MetacellAnalysisToolkit/lib/python3.9/site-packages/rpy2/rinterfacelib/conversion.py", line 45, in cdata = function(args, **kwargs) File "$HOME/.conda/envs/MetacellAnalysisToolkit/lib/python3.9/site-packages/rpy2/rinterface.py", line 817, in call raise embedded.RRuntimeError(_rinterface._geterrmessage()) rpy2.rinterface_lib.embedded.RRuntimeError: Error in gzfile(file, "rb") : cannot open the connection

I also get a warning before: WARNING: The R package "reticulate" only fixed recently an issue that caused a segfault when used with rpy2: https://github.com/rstudio/reticulate/pull/1188 Make sure that you use a version of that package that includes the fix.

I did install the latest reticulate, but the error persists

What do you think ?

aurelieGabriel commented 5 months ago

Hi @gianfilippo,

Thank you for your feedbacks. If I understand correctly you also have this error on the example data when using SEACells and MetaCell, could you please provide us the command line which led to this error?

Also, it seems that the error occurs when running readRDS(input_file), could you give us more information on how you built the input file?

Note that for SEACell, we recently fixed an issue that was arising when a seurat object without pca embedding was provided as input. If you are in this configuration, please make sure to pull the last changes of the github repo and if needed pull again the docker containers (container with SeuratV5: agabriel/matk:SeuratV5 and container with SeuratV4: agabriel/matk:SeuratV4).

Best,

gianfilippo commented 5 months ago

Hi,

I tried it again, without making any changes and it works now. I do get some warnings with MetaCell, but it seems ok. Then problem with the test data was the wrong input file. The problem with my own data is unclear, as I did not change anything. I should probably just take a break :)

Thanks again for your input.

Best