CSOgroup / cellcharter

A Python package for the identification, characterization and comparison of spatial clusters from spatial -omics data.
https://cellcharter.readthedocs.io/en/latest/
BSD 3-Clause "New" or "Revised" License
79 stars 2 forks source link

Installation fails with version 0.2.0 #19

Closed shashwatsahay closed 7 months ago

shashwatsahay commented 1 year ago

Report

Hi

I just tried to install cellcharter and the problem is the inability of of spatialdata to work with scvi tools could you please share the correct version information for both that you are using

Version information

No response

marcovarrone commented 1 year ago

Hi @shashwatsahay, In the README there is a sequence of commands to correctly install cellcharter and its dependencies:

conda create -n cellcharter-env -c conda-forge python=3.10 mamba
conda activate cellcharter-env
mamba install pytorch==1.13.1 torchvision==0.14.1 torchaudio==0.13.1 pytorch-cuda=11.7 -c pytorch -c nvidia
pip install cellcharter
pip install scvi-tools==1.0.3

Did you try it? Let me know if it works for you as well.

I will try to improve the dependency conflict problem, but a lot of it does not depend on me!

marcovarrone commented 1 year ago

Hi @shashwatsahay I looked better at the conflicts and indeed scvi-tools and spatialdata have a conflict on xarray-dataclasses. I will check if I can submit a PR to scvi-tools to extend the compatibility. Note that pip gives an error about the conflict but the installation would actually complete successfully, and since CellCharter doesn't use any of the features for which there is a conflict it should not be a problem.

In any case, I updated the README with a series of commands to install CellCharter without any conflict, but I hope in the future we will manage to make everything work based on the latest versions of the dependencies.

Thank you very much and let me know if you think the issue is solved!

shashwatsahay commented 1 year ago

Hi @marcovarrone sorry for the late response,

I tried running the tool using the steps you mentioned. It ran but broke again on the dimensionality reduction step. The package to blame here was Flax which required a particular version of Jax to be installed. Again this originates from scvi-tools so not something I hope for you to fix.

Luckily I had a working version of flax and jax installed in a different environment, which ran through but now the output of the the notebook cosmx_human_nsclc.ipynb is different

I am attaching plots

Fig1: difference in cluster stability profile

Fig2: difference in colour profile

For now I am attributing it to some random seed being set

Fig1 Fig2

marcovarrone commented 1 year ago

Hi @shashwatsahay, I installed flax explicitly using pip install jax==0.4.14 jaxlib==0.4.14 chex==0.1.7 flax==0.7.2, and I didn't have problems on both the CPU and GPU machine.

Regarding the reproducibility, I probably run the notebook in a brief phase in which I put the Adjusted Rand Index as a metric for the cluster stability rather than the Fowlkes-Mallows Index as in your run. I will check if the seed is actually enforced correctly in all steps (as far as I know it's also not possible to enforce reproducibility between CPU and GPU run even with the same seed, correct me if I am wrong) and also update the notebook, but anyway, as you can see the peaks are similar and the results are almost identical (except for the color profile). I will check and make sure the rendered notebooks look more similar to what you get when you actually run it.

Thank you very much again for your feedback!

Feyapeng commented 7 months ago

Hi @shashwatsahay, In the README there is a sequence of commands to correctly install cellcharter and its dependencies:

conda create -n cellcharter-env -c conda-forge python=3.10 mamba
conda activate cellcharter-env
mamba install pytorch==1.13.1 torchvision==0.14.1 torchaudio==0.13.1 pytorch-cuda=11.7 -c pytorch -c nvidia
pip install cellcharter
pip install scvi-tools==1.0.3

Did you try it? Let me know if it works for you as well.

I will try to improve the dependency conflict problem, but a lot of it does not depend on me!

Hi @marcovarrone I tried with this script both in Ubuntu 22.04 ARM64 and MacOS(M3 MAX),both have dependency conflict problem, is it because these two devices are not enough for cellcharter analysis?

Thanks!

marcovarrone commented 7 months ago

Hi @Feyapeng, they should work! Can I ask you which conflicts do you get?

Feyapeng commented 7 months ago

Hi @Feyapeng, they should work! Can I ask you which conflicts do you get?

Hi @marcovarrone, Yes. For example, in macOS, it reports as below:

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. chex 0.1.85 requires numpy>=1.24.1, but you have numpy 1.23.4 which is incompatible. pyro-ppl 1.9.0 requires torch>=2.0, but you have torch 1.12.1 which is incompatible.

if I try to install older versions of chex and pyro-ppl, then other conflicts come again.

Thanks!

marcovarrone commented 7 months ago

Hi @Feyapeng, I managed to solve the problem for pyro-ppl. I described better the problem in #29. In the installation procedure, you have to replace pip install scvi-tools with pip install pyro-ppl==1.8.6 scvi-tools==0.20.3.

Regarding the error about chex, I would just ignore it because CellCharter doesn't use JAX (and thus doesn't use chex).

Feyapeng commented 6 months ago

Hi @marcovarrone ,yes, I ignored the error about Chex and it works, but unable to load the trVAE model, if I understand correctly, I need to have the trVAE model in my computer first, right? Since we have CODEX platform, I want to analyze our own CODEX data, could you tell me how to establish the trVAE model for a certain CODEX data. Thanks very much!

marcovarrone commented 6 months ago

Great! Do you think you may have batch effects between samples? I would suggest you to try first using directly the marker intensities. Then, if you see for example different clusters for the same regions of replicate samples or different clusters for areas that are supposed to be associated with the same niche you can use trVAE. But I will suggest you first to give it a try without. For example, if you have the marker intensities in adata.X, you just don't pass the parameter use_rep to the function gr.aggregate_neighbors. The rest is the same as in the notebook!

One other aspect that I have never managed to understand is if it's better to pass the intensities through the logarithm function or use the raw values as they are. So, if you have only positive intensities, I would suggest you to try to run scanpy.pp.log1p before aggregating the values and see if the results are better than without.

I will publish a tutorial on how to run trVAE in case there are batch effects, but I also have a long list of things to do for the library, so it may take a bit!