bowang-lab / scGPT

https://scgpt.readthedocs.io/en/latest/
MIT License
1.01k stars 197 forks source link

any complete installation instructions? #58

Closed bzip2 closed 1 year ago

bzip2 commented 1 year ago

Hi.

I've spent several days trying to install scGPT. I may be mistaken, but the instructions appear to have significant gaps. Is it possible for you to post a single, complete set of instructions that you have followed from beginning to end, by starting a new instance and documenting every step, including the specific versions of the instance or container and of every tool? I've tried to guess, but after I solve one problem, I bump into another (R, ld, etc.).

It would not be useful to post answers to individual questions here. Again, the request is for a single complete set of instructions.

Just as an example, here's the error message I get when I use the poetry install option due to apparently undeclared expectation that R is installed in advance (similar errors occur with pip install, and whether using Nvidia Pytorch container 23.07-py3 or 22.08-py3):

Installing rpy2 (3.4.2): Failed ChefBuildError Backend subprocess exited when trying to invoke build_wheel ['cffi>=1.10.0', 'jinja2', 'pytz', 'tzlocal'] Error: rpy2 in API mode cannot be built without R in the PATH or R_HOME defined. Correct this or force ABI mode-only by defining the environment variable RPY2_CFFI_MODE=ABI at ~/.local/share/pypoetry/venv/lib/python3.8/site-packages/poetry/installation/chef.py:147 in _prepare 144│ error = ChefBuildError("\n\n".join(message_parts)) 146│ if error is not None: → 147│ raise error from None 149│ return path 151│ def _prepare_sdist(self, archive: Path, destination: Path | None = None) -> Path: Note: This error originates from the build backend, and is likely not a problem with poetry but with rpy2 (3.4.2) not supporting PEP 517 builds. You can verify this by running 'pip wheel --use-pep517 "rpy2 (==3.4.2)"'.

subercui commented 1 year ago

Hi, to summarize, the installation instruction itself is complete. However, it is true that flash-attn has been reported to be difficult to install.

To answer your questions:

  1. It is true that rpy2 would require R installed. I previously assumed many users may have R installed at system level already, but thanks for the reminder, I added a new note for this to avoid future confusion. For your case, if you haven't have R installed, you may follow the official instruction here https://cran.r-project.org/ . There are also a lot of more user-friendly guides for installing R online, for example, https://www.digitalocean.com/community/tutorials/how-to-install-r-on-ubuntu-22-04 . If your case is that you already have R installed somewhere, but rpy2 can't find it, please see the note here https://github.com/rpy2/rpy2#issues-loading-shared-c-libraries

  2. If you mean the docs/environment.yml file, that is for building up the documentation website. You don't need to follow that one for your working env. Please see the instruction in the Readme.md. On the other hand, it is a good idea to use conda to manage your virtual envs. Although recommended, this is not required. pip install should work both with or without conda.

3 and 4. Usually, I found the issues encountered when installing flash-attn are "stand-alone" issues not related to other dependencies. Again, the package is strongly dependent with the local GPU versions, cuda, and pytorch versions. I personally have tested cuda 11.7, pytorch 1.13.0 with flash-attn 1.0.1 or 1.0.2 . Those have worked for my scenario with A40 and A100 GPUs. So, I will suggest you share the exact error messages you had when pip install flash-attn==x.x.x and let us know your GPU, cuda, and pytorch versions as well.

bzip2 commented 1 year ago

Hi. Thanks for your reply.

I'm not able to launch A40 or A100 GPUs. My options are M60, A10G and T4. I previously used an M60, but I have now done everything described here on both an M60 and an A10G, with identical results.

$ nvidia-smi
| NVIDIA-SMI 515.65.01    Driver Version: 515.65.01    CUDA Version: 11.7     |
|   0  Tesla M60           Off  | 00000000:00:1E.0 Off |                    0 |

$ nvcc -V
Built on Wed_Jun__8_16:49:14_PDT_2022
Cuda compilation tools, release 11.7, V11.7.99
Build cuda_11.7.r11.7/compiler.31442593_0

I didn't do this previously, and it seems to make no difference.

$ nvcc -V Built on Wed_Jun__8_16:49:14_PDT_2022 Cuda compilation tools, release 11.7, V11.7.99 Build cuda_11.7.r11.7/compiler.31442593_0

$ ninja --version 1.11.0

$ python Python 3.8.13 | packaged by conda-forge | (default, Mar 25 2022, 06:04:18) [GCC 10.3.0] on linux

import torch torch.version '1.13.0+cu117' torch.cuda.is_available() True

In the container Python was installed with conda and pip means /opt/conda/bin/pip. Many of the packages needed by scGPT were installed with pip, except that packaging was installed with conda.

Python package versions found in the container:

torch                     1.13.0a0+d321be6          pypi_0    pypi
torch-tensorrt            1.2.0a0                  pypi_0    pypi
torchtext                 0.11.0a0                 pypi_0    pypi
torchvision               0.14.0a0                 pypi_0    pypi
mdit-py-plugins           0.3.0                    pypi_0    pypi
markdown-it-py            2.1.0                    pypi_0    pypi
protobuf                  3.20.1                   pypi_0    pypi
cuda-python               11.6.0                   pypi_0    pypi
packaging                 21.3               pyhd8ed1ab_0    **conda-forge**
  Attempting uninstall: torchtext
    Found existing installation: torchtext 0.11.0a0
    Uninstalling torchtext-0.11.0a0:
      Successfully uninstalled torchtext-0.11.0a0
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
torchvision 0.14.0a0 requires torch==1.13.0a0+d321be6, but you have torch 1.13.0 which is incompatible.
mdit-py-plugins 0.3.0 requires markdown-it-py<3.0.0,>=1.0.0, but you have markdown-it-py 3.0.0 which is incompatible.
jupytext 1.14.1 requires markdown-it-py<3.0.0,>=1.0.0, but you have markdown-it-py 3.0.0 which is incompatible.
cudf 22.6.0a0+319.g97422602b8 requires protobuf<3.21.0a0,>=3.20.1, but you have protobuf 3.19.6 which is incompatible.
Successfully installed aiohttp-3.8.5 aiosignal-1.3.1 anndata-0.9.2 anndata2ri-1.2 async-timeout-4.0.2 backports.zoneinfo-0.2.1 cached-property-1.5.2 chex-0.1.7 comm-0.1.4 datasets-2.14.3 deprecated-1.2.14 dill-0.3.7 dm-tree-0.1.8 docrep-0.3.2 et-xmlfile-1.1.0 etils-1.3.0 flax-0.7.1 frozenlist-1.4.0 h5py-3.9.0 huggingface-hub-0.16.4 igraph-0.9.11 ipywidgets-8.1.0 jax-0.4.13 jaxlib-0.4.13 jupyterlab-widgets-3.0.8 leidenalg-0.8.10 lightning-utilities-0.9.0 llvmlite-0.38.1 louvain-0.7.2 markdown-it-py-3.0.0 ml-dtypes-0.2.0 multidict-6.0.4 multipledispatch-1.0.0 multiprocess-0.70.15 natsort-8.4.0 numba-0.55.2 numpyro-0.12.1 nvidia-cublas-cu11-11.10.3.66 nvidia-cuda-nvrtc-cu11-11.7.99 nvidia-cuda-runtime-cu11-11.7.99 nvidia-cudnn-cu11-8.5.0.96 openpyxl-3.1.2 opt-einsum-3.3.0 optax-0.1.7 orbax-checkpoint-0.2.3 pandas-1.3.5 patsy-0.5.3 protobuf-3.19.6 pyDeprecate-0.3.1 pyarrow-12.0.1 pygments-2.16.1 pynndescent-0.5.10 pyro-api-0.1.2 pyro-ppl-1.8.6 pytorch-lightning-1.5.10 rich-13.5.2 rpy2-3.5.13 safetensors-0.3.1 scanpy-1.9.3 scgpt-0.1.2.post1 scib-1.0.4 scikit-misc-0.1.4 scipy-1.10.1 scvi-tools-0.16.4 seaborn-0.12.2 session-info-1.0.0 statsmodels-0.14.0 stdlib-list-0.9.0 tensorstore-0.1.40 texttable-1.6.7 tokenizers-0.13.3 torch-1.13.0 torchmetrics-1.0.2 torchtext-0.14.0 transformers-4.31.0 tzlocal-5.0.1 umap-learn-0.5.3 widgetsnbextension-4.0.8 wrapt-1.15.0 xxhash-3.3.0 yarl-1.9.2
> $ python
> Python 3.8.13 | packaged by conda-forge | (default, Mar 25 2022, 06:04:10)
> [GCC 10.3.0] on linux
> Type "help", "copyright", "credits" or "license" for more information.
> >>> import scgpt
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
>   File "/opt/conda/lib/python3.8/site-packages/scgpt/__init__.py", line 18, in <module>
>     from . import model, tokenizer, scbank, utils
>   File "/opt/conda/lib/python3.8/site-packages/scgpt/model/__init__.py", line 1, in <module>
>     from .model import (
>   File "/opt/conda/lib/python3.8/site-packages/scgpt/model/model.py", line 12, in <module>
>     from flash_attn.flash_attention import FlashMHA
>   File "/opt/conda/lib/python3.8/site-packages/flash_attn/flash_attention.py", line 7, in <module>
>     from flash_attn.flash_attn_interface import flash_attn_unpadded_qkvpacked_func
>   File "/opt/conda/lib/python3.8/site-packages/flash_attn/flash_attn_interface.py", line 5, in <module>
>     import flash_attn_cuda
> ImportError: /opt/conda/lib/python3.8/site-packages/flash_attn_cuda.cpython-38-x86_64-linux-gnu.so: undefined symbol: _ZN3c105ErrorC2ENS_14SourceLocationENSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE
$ poetry install
Creating virtualenv scgpt-R2X-0p9C-py3.8 in /root/.cache/pypoetry/virtualenvs
Installing dependencies from lock file
Package operations: 214 installs, 2 updates, 0 removals
  • Installing zipp (3.8.0)
  •
  • (installs many packages)
  •
  • Installing flash-attn (1.0.1): Failed
  ChefBuildError
  Backend subprocess exited when trying to invoke get_requires_for_build_wheel
  Traceback (most recent call last):
    File "/root/.local/share/pypoetry/venv/lib/python3.8/site-packages/pyproject_hooks/_in_process/_in_process.py", line 353, in <module>
      main()
    File "/root/.local/share/pypoetry/venv/lib/python3.8/site-packages/pyproject_hooks/_in_process/_in_process.py", line 335, in main
      json_out['return_val'] = hook(**hook_input['kwargs'])
    File "/root/.local/share/pypoetry/venv/lib/python3.8/site-packages/pyproject_hooks/_in_process/_in_process.py", line 118, in get_requires_for_build_wheel
      return hook(config_settings)
    File "/tmp/tmpjhe11x3w/.venv/lib/python3.8/site-packages/setuptools/build_meta.py", line 341, in get_requires_for_build_wheel
      return self._get_build_requires(config_settings, requirements=['wheel'])
    File "/tmp/tmpjhe11x3w/.venv/lib/python3.8/site-packages/setuptools/build_meta.py", line 323, in _get_build_requires
      self.run_setup()
    File "/tmp/tmpjhe11x3w/.venv/lib/python3.8/site-packages/setuptools/build_meta.py", line 487, in run_setup
      super(_BuildMetaLegacyBackend,
    File "/tmp/tmpjhe11x3w/.venv/lib/python3.8/site-packages/setuptools/build_meta.py", line 338, in run_setup
      exec(code, locals())
    File "<string>", line 6, in <module>
  ModuleNotFoundError: No module named 'packaging'
  at ~/.local/share/pypoetry/venv/lib/python3.8/site-packages/poetry/installation/chef.py:147 in _prepare
      144│                 error = ChefBuildError("\n\n".join(message_parts))
      146│             if error is not None:
    → 147│                 raise error from None
      149│             return path
      151│     def _prepare_sdist(self, archive: Path, destination: Path | None = None) -> Path:

Note: This error originates from the build backend, and is likely not a problem with poetry but with flash-attn (1.0.1) not supporting PEP 517 builds. You can verify this by running 'pip wheel --use-pep517 "flash-attn (==1.0.1)"'.

Just to confirm, packaging is installed, so it seems that it must be installed in a specific way:

$ pip list |grep packaging
packaging                     23.1
$ pip install packaging
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Requirement already satisfied: packaging in /opt/conda/lib/python3.8/site-packages (23.1)

Thanks.

subercui commented 1 year ago

Hi, thank you for the detailed messages! It looks like the rpy2 installation is OK now after installing R. Now the rest of the issue seems related to flash-attn.

After this step:

Install flash-attn:

# on M80
pip install flash-attn==1.0.2
# on A10G
pip install flash-attn==1.0.1 --no-build-isolation

Both installations succeeded and what follows was identical.

can you try run any tests for flash-attn, for example, the official ones here https://github.com/Dao-AILab/flash-attention/blob/main/benchmarks/benchmark_flash_attention.py and https://github.com/Dao-AILab/flash-attention/blob/main/tests/test_flash_attn.py

This is to verify the installation of flash-attn alone. Optionally, you may not need to do the installation again. Just run these tests in your current env to see whether the flash-attn has been properly installed.

BTW, there are several related issues in the flash-attn repo as well https://github.com/Dao-AILab/flash-attention/issues?q=flash_attn_cuda.cpython-38-x86_64-linux-gnu.so%3A+undefined+symbol . Please have a look and hope some walkaround there can help your case. I also think the official repo hasn't found a universal solution yet. The answers are case-by-case so far.

subercui commented 1 year ago

Hi @bzip2 , do you have any updates?

bzip2 commented 1 year ago

Hi, @subercui

Thank you for asking. I'm able to install flash-attn and run the tests, but scGPT is another matter. With the setup described above, I am only able to install scGPT like this:

This also works if I install flash 1.0.4 the first time then 1.0.1 the second time.

bzip2 commented 1 year ago

Does scGPT expect R to be installed in a specific way? In this case, I've installed scGPT in a conda environment as a way to cope with all the dependencies and the limitations of pip, but R is definitely on my path. I can import all the Python packages just fine except for scGPT:

# python
Python 3.8.17 
[GCC 11.2.0] :: Anaconda, Inc. on linux
>>> import rpy2, torch, flash_attn
>>> from flash_attn.flash_attn_interface import flash_attn_unpadded_qkvpacked_func
>>> from flash_attn.flash_attention import FlashMHA
>>> torch.cuda.is_available()
True
>>> import scgpt
During startup - Warning message:
package ‘stats’ in options("defaultPackages") was not found
Global seed set to 0
>>>
# R
R version 4.3.1 (2023-06-16) -- "Beagle Scouts"
(...)
> library(stats)
>
bzip2 commented 1 year ago

I installed scGPT with pip install. It imports a package it did not install:

# python scGPT/tests/test_scbank.py
Traceback (most recent call last):
  File "scGPT/tests/test_scbank.py", line 5, in <module>
    import pytest
ModuleNotFoundError: No module named 'pytest'

Successfully installed MarkupSafe-2.1.3 absl-py-1.4.0 aiohttp-3.8.5 aiosignal-1.3.1 anndata-0.9.2 anndata2ri-1.2 async-timeout-4.0.3 attrs-23.1.0 backports.zoneinfo-0.2.1 cached_property-1.5.2 cachetools-5.3.1 certifi-2023.7.22 cffi-1.15.1 charset-normalizer-3.2.0 chex-0.1.7 contourpy-1.1.0 cycler-0.11.0 datasets-2.14.4 deprecated-1.2.14 dill-0.3.7 dm-tree-0.1.8 docrep-0.3.2 et-xmlfile-1.1.0 etils-1.3.0 filelock-3.12.2 flax-0.7.2 fonttools-4.42.0 frozenlist-1.4.0 fsspec-2023.6.0 future-0.18.3 google-auth-2.22.0 google-auth-oauthlib-1.0.0 grpcio-1.57.0 h5py-3.9.0 huggingface-hub-0.16.4 idna-3.4 igraph-0.9.11 importlib-resources-6.0.1 ipywidgets-8.1.0 jax-0.4.13 jaxlib-0.4.13 jinja2-3.1.2 joblib-1.3.2 jupyterlab-widgets-3.0.8 kiwisolver-1.4.4 leidenalg-0.8.10 lightning-utilities-0.9.0 llvmlite-0.38.1 louvain-0.7.2 markdown-3.4.4 markdown-it-py-3.0.0 matplotlib-3.7.2 mdurl-0.1.2 ml-dtypes-0.2.0 msgpack-1.0.5 multidict-6.0.4 multipledispatch-1.0.0 multiprocess-0.70.15 natsort-8.4.0 networkx-3.1 numba-0.55.2 numpy-1.22.4 numpyro-0.12.1 nvidia-cublas-cu11-11.10.3.66 nvidia-cuda-nvrtc-cu11-11.7.99 nvidia-cuda-runtime-cu11-11.7.99 nvidia-cudnn-cu11-8.5.0.96 oauthlib-3.2.2 openpyxl-3.1.2 opt-einsum-3.3.0 optax-0.1.7 orbax-checkpoint-0.2.3 pandas-1.3.5 patsy-0.5.3 pillow-10.0.0 protobuf-3.20.1 pyDeprecate-0.3.1 pyarrow-12.0.1 pyasn1-0.5.0 pyasn1-modules-0.3.0 pycparser-2.21 pydot-1.4.2 pynndescent-0.5.10 pyparsing-3.0.9 pyro-api-0.1.2 pyro-ppl-1.8.6 pytorch-lightning-1.5.10 pytz-2023.3 pyyaml-6.0.1 regex-2023.8.8 requests-2.31.0 requests-oauthlib-1.3.1 rich-13.5.2 rpy2-3.5.13 rsa-4.9 safetensors-0.3.2 scanpy-1.9.3 scgpt-0.1.2.post1 scib-1.0.4 scikit-learn-1.3.0 scikit-misc-0.1.4 scipy-1.10.1 scvi-tools-0.16.4 seaborn-0.12.2 session-info-1.0.0 setuptools-59.5.0 statsmodels-0.14.0 stdlib_list-0.9.0 tensorboard-2.14.0 tensorboard-data-server-0.7.1 tensorstore-0.1.41 texttable-1.6.7 threadpoolctl-3.2.0 tokenizers-0.13.3 toolz-0.12.0 torch-1.13.0 torchmetrics-1.0.3 torchtext-0.14.0 tqdm-4.66.1 transformers-4.31.0 tzlocal-5.0.1 umap-learn-0.5.3 urllib3-1.26.16 werkzeug-2.3.7 widgetsnbextension-4.0.8 wrapt-1.15.0 xxhash-3.3.0 yarl-1.9.2

subercui commented 1 year ago

Does scGPT expect R to be installed in a specific way? In this case, I've installed scGPT in a conda environment as a way to cope with all the dependencies and the limitations of pip, but R is definitely on my path. I can import all the Python packages just fine except for scGPT:

# python
Python 3.8.17 
[GCC 11.2.0] :: Anaconda, Inc. on linux
>>> import rpy2, torch, flash_attn
>>> from flash_attn.flash_attn_interface import flash_attn_unpadded_qkvpacked_func
>>> from flash_attn.flash_attention import FlashMHA
>>> torch.cuda.is_available()
True
>>> import scgpt
During startup - Warning message:
package ‘stats’ in options("defaultPackages") was not found
Global seed set to 0
>>>
# R
R version 4.3.1 (2023-06-16) -- "Beagle Scouts"
(...)
> library(stats)
>

A generic R installation is fine, for example, conda install would be fine. The only reason we need R is that the dependency scib used rpy2 and then rpy2 needs R.

It looks like the package is already imported? Most likely, you can ignore the warning message.

subercui commented 1 year ago

I installed scGPT with pip install. It imports a package it did not install:

# python scGPT/tests/test_scbank.py
Traceback (most recent call last):
  File "scGPT/tests/test_scbank.py", line 5, in <module>
    import pytest
ModuleNotFoundError: No module named 'pytest'

Successfully installed MarkupSafe-2.1.3 absl-py-1.4.0 aiohttp-3.8.5 aiosignal-1.3.1 anndata-0.9.2 anndata2ri-1.2 async-timeout-4.0.3 attrs-23.1.0 backports.zoneinfo-0.2.1 cached_property-1.5.2 cachetools-5.3.1 certifi-2023.7.22 cffi-1.15.1 charset-normalizer-3.2.0 chex-0.1.7 contourpy-1.1.0 cycler-0.11.0 datasets-2.14.4 deprecated-1.2.14 dill-0.3.7 dm-tree-0.1.8 docrep-0.3.2 et-xmlfile-1.1.0 etils-1.3.0 filelock-3.12.2 flax-0.7.2 fonttools-4.42.0 frozenlist-1.4.0 fsspec-2023.6.0 future-0.18.3 google-auth-2.22.0 google-auth-oauthlib-1.0.0 grpcio-1.57.0 h5py-3.9.0 huggingface-hub-0.16.4 idna-3.4 igraph-0.9.11 importlib-resources-6.0.1 ipywidgets-8.1.0 jax-0.4.13 jaxlib-0.4.13 jinja2-3.1.2 joblib-1.3.2 jupyterlab-widgets-3.0.8 kiwisolver-1.4.4 leidenalg-0.8.10 lightning-utilities-0.9.0 llvmlite-0.38.1 louvain-0.7.2 markdown-3.4.4 markdown-it-py-3.0.0 matplotlib-3.7.2 mdurl-0.1.2 ml-dtypes-0.2.0 msgpack-1.0.5 multidict-6.0.4 multipledispatch-1.0.0 multiprocess-0.70.15 natsort-8.4.0 networkx-3.1 numba-0.55.2 numpy-1.22.4 numpyro-0.12.1 nvidia-cublas-cu11-11.10.3.66 nvidia-cuda-nvrtc-cu11-11.7.99 nvidia-cuda-runtime-cu11-11.7.99 nvidia-cudnn-cu11-8.5.0.96 oauthlib-3.2.2 openpyxl-3.1.2 opt-einsum-3.3.0 optax-0.1.7 orbax-checkpoint-0.2.3 pandas-1.3.5 patsy-0.5.3 pillow-10.0.0 protobuf-3.20.1 pyDeprecate-0.3.1 pyarrow-12.0.1 pyasn1-0.5.0 pyasn1-modules-0.3.0 pycparser-2.21 pydot-1.4.2 pynndescent-0.5.10 pyparsing-3.0.9 pyro-api-0.1.2 pyro-ppl-1.8.6 pytorch-lightning-1.5.10 pytz-2023.3 pyyaml-6.0.1 regex-2023.8.8 requests-2.31.0 requests-oauthlib-1.3.1 rich-13.5.2 rpy2-3.5.13 rsa-4.9 safetensors-0.3.2 scanpy-1.9.3 scgpt-0.1.2.post1 scib-1.0.4 scikit-learn-1.3.0 scikit-misc-0.1.4 scipy-1.10.1 scvi-tools-0.16.4 seaborn-0.12.2 session-info-1.0.0 setuptools-59.5.0 statsmodels-0.14.0 stdlib_list-0.9.0 tensorboard-2.14.0 tensorboard-data-server-0.7.1 tensorstore-0.1.41 texttable-1.6.7 threadpoolctl-3.2.0 tokenizers-0.13.3 toolz-0.12.0 torch-1.13.0 torchmetrics-1.0.3 torchtext-0.14.0 tqdm-4.66.1 transformers-4.31.0 tzlocal-5.0.1 umap-learn-0.5.3 urllib3-1.26.16 werkzeug-2.3.7 widgetsnbextension-4.0.8 wrapt-1.15.0 xxhash-3.3.0 yarl-1.9.2

We used pytest as developing dependencies. Did not pack it into the pip release since it is only used in the tests. So are you trying to run these unit test scripts? If you wish to do so, just run pip install pytest and then it should work.

subercui commented 1 year ago

It looks like you have everything installed. Just a reminder, the unit tests in the tests folder are for implementation correctness checking. They do not actually test the "algorithm functionalities". You may instead run any of the files in the examples or tutorials folder.

Neustradamus commented 1 year ago

@bzip2: I try to contact you in private, what is your email address? Can you comment here and after remove the comment? Thanks in advance.

mjstrumillo commented 1 year ago

I think I have a similar problem I jsut go in circles about which packages are causing which problems:

piroot@74c6468c7528:/usr/src/app# pip show flash_attn
Name: flash-attn
Version: 1.0.4
Summary: Flash Attention: Fast and Memory-Efficient Exact Attention
Home-page: https://github.com/HazyResearch/flash-attention
Author: Tri Dao
Author-email: trid@stanford.edu
License: 
Location: /opt/conda/lib/python3.10/site-packages
Requires: einops, packaging, torch
Required-by: scgpt

but then

>>> import flash_attn
>>> import scgpt
/opt/conda/lib/python3.10/site-packages/scgpt/model/model.py:19: UserWarning: flash_attn is not installed
  warnings.warn("flash_attn is not installed")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/opt/conda/lib/python3.10/site-packages/scgpt/__init__.py", line 18, in <module>
    from . import model, tokenizer, scbank, utils, tasks
  File "/opt/conda/lib/python3.10/site-packages/scgpt/tokenizer/__init__.py", line 1, in <module>
    from .gene_tokenizer import *
  File "/opt/conda/lib/python3.10/site-packages/scgpt/tokenizer/gene_tokenizer.py", line 11, in <module>
    import torchtext.vocab as torch_vocab
  File "/opt/conda/lib/python3.10/site-packages/torchtext/__init__.py", line 12, in <module>
    from . import data, datasets, prototype, functional, models, nn, transforms, utils, vocab, experimental
  File "/opt/conda/lib/python3.10/site-packages/torchtext/datasets/__init__.py", line 3, in <module>
    from .ag_news import AG_NEWS
  File "/opt/conda/lib/python3.10/site-packages/torchtext/datasets/ag_news.py", line 12, in <module>
    from torchdata.datapipes.iter import FileOpener, IterableWrapper
  File "/opt/conda/lib/python3.10/site-packages/torchdata/__init__.py", line 9, in <module>
    from . import datapipes
  File "/opt/conda/lib/python3.10/site-packages/torchdata/datapipes/__init__.py", line 9, in <module>
    from . import iter, map, utils
  File "/opt/conda/lib/python3.10/site-packages/torchdata/datapipes/iter/__init__.py", line 121, in <module>
    from torchdata.datapipes.iter.util.sharding import (
  File "/opt/conda/lib/python3.10/site-packages/torchdata/datapipes/iter/util/sharding.py", line 9, in <module>
    from torch.utils.data.datapipes.iter.sharding import SHARDING_PRIORITIES
ModuleNotFoundError: No module named 'torch.utils.data.datapipes.iter.sharding'
>>> 

although:

root@74c6468c7528:/usr/src/app# pip show torch
Name: torch
Version: 1.13.0
Summary: Tensors and Dynamic neural networks in Python with strong GPU acceleration
Home-page: https://pytorch.org/
Author: PyTorch Team
Author-email: packages@pytorch.org
License: BSD-3
Location: /opt/conda/lib/python3.10/site-packages
Requires: nvidia-cublas-cu11, nvidia-cuda-nvrtc-cu11, nvidia-cuda-runtime-cu11, nvidia-cudnn-cu11, typing-extensions
Required-by: flash-attn, pyro-ppl, pytorch-lightning, scgpt, scvi-tools, torchaudio, torchdata, torchelastic, torchmetrics, torchtext, torchvision, triton
root@74c6468c7528:/usr/src/app# pip show torchtext
Name: torchtext
Version: 0.14.0
Summary: Text utilities and datasets for PyTorch
Home-page: https://github.com/pytorch/text
Author: PyTorch core devs and James Bradbury
Author-email: jekbradbury@gmail.com
License: BSD
Location: /opt/conda/lib/python3.10/site-packages
Requires: numpy, requests, torch, tqdm
Required-by: scgpt
subercui commented 1 year ago

Hi @mjstrumillo , it seems to be an issue with torchtext, but the versions you showed should be compatible, https://github.com/pytorch/text#installation . what happens if you just try this in python?

import torch
import torchtext
print(torch.__version__, torchtext.__version__)
import torchtext.vocab as torch_vocab
jzqin commented 1 year ago

@mjstrumillo I just had your torchtext / torchdata error issue too. I believe your problem is with the "torchdata" package (which was likely already in your package environment before trying to install scGPT) and not torchtext itself. The error No module named 'torch.utils.data.datapipes.iter.sharding' exists because torchdata expects torch>=2.0, while scgpt expects torch=1.13. You should uninstall torchdata (e.g. pip uninstall torchdata). If you try importing scGPT again after, your error about torchdata should be gone.

mjstrumillo commented 1 year ago

ok, I trouble shot this with pip uninstall torchvision torchaudio torchdata torch (I tried one by one, but all of this needed to go) then pip installing scgpt (that covered the correct torch versions) p.s. I never did install anything torch related manually, all of this was in the env

now the only remaining issue is that I do in fact have a correct version of flash_attn

pip show flash_attn
Name: flash-attn
Version: 1.0.4

then I can import it to python, but scgpt claims its not there:

>>> import flash_attn
>>> import scgpt
/opt/conda/lib/python3.10/site-packages/scgpt/model/model.py:19: UserWarning: flash_attn is not installed
  warnings.warn("flash_attn is not installed")
/opt/conda/lib/python3.10/site-packages/torchmetrics/utilities/imports.py:18: DeprecationWarning: The distutils package is deprecated and slated for removal in Python 3.12. Use setuptools or check PEP 632 for potential alternatives
  from distutils.version import LooseVersion
Global seed set to 0

both of the paths seems to match

>>> print(flash_attn.__file__)
/opt/conda/lib/python3.10/site-packages/flash_attn/__init__.py
>>> print(scgpt.__file__)
/opt/conda/lib/python3.10/site-packages/scgpt/__init__.py

so I honestly dont know what to do now...

mjstrumillo commented 1 year ago

I checked what actually gets imported in model.py:19 and its from flash_attn.flash_attention import FlashMHA when running jsut that line, I get:

>>> from flash_attn.flash_attention import FlashMHA
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/opt/conda/lib/python3.10/site-packages/flash_attn/flash_attention.py", line 7, in <module>
    from flash_attn.flash_attn_interface import flash_attn_unpadded_qkvpacked_func
  File "/opt/conda/lib/python3.10/site-packages/flash_attn/flash_attn_interface.py", line 5, in <module>
    import flash_attn_cuda
ImportError: /opt/conda/lib/python3.10/site-packages/flash_attn_cuda.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZN3c104cuda20CUDACachingAllocator9allocatorE

so Im reinstalling the flash_attn, probably wrong compiled somehow?...

mjstrumillo commented 1 year ago

I realise last 3 comments are all mine, but that worked. so installing flash_attn, then scgpt, then uninstalling all torch dependencies, then installing scgpt, then uninstalling flash_attn, then recompiling flash_attn from scratch. I guess that kind of makes sense, but oh man....

subercui commented 1 year ago

Hi @mjstrumillo glad to know it finally works. Sorry about the mess in your installation process, and admittedly, it was supposed to be easier. Reading from the above comments, it looks like there were two issues during your process, just to summarize them here and wish to provide some insights:

  1. The torchtext is indeed picky about the version of torch. We thus have set the specific versions in the pyproject.toml file, so in theory pip should handle this very well. It is still unclear to me why your early installation encountered conflict. https://github.com/bowang-lab/scGPT/issues/58#issuecomment-1710854116 Thankfully, after reinstalling this seems worked out fine.

  2. The "undefined symbol: xxx" you mentioned here https://github.com/bowang-lab/scGPT/issues/58#issuecomment-1719872047 indicates incompatibility between flash-attn and the environment. You can find similar issues in the flash-attn repo. Glad to know it worked after reinstallation.

  3. Reading your last comment

    I realise last 3 comments are all mine, but that worked. so installing flash_attn, then scgpt, then uninstalling all torch dependencies, then installing scgpt, then uninstalling flash_attn, then recompiling flash_attn from scratch. I guess that kind of makes sense, but oh man....

    I just realized some issue may be related to the order of the commands. For example, if you start with pip install flash_attn, it probably will select very new versions of flash_attn and pytorch>2.1, and then the later command of pip install scgpt will again change the versions and cause the incompatibility.

In summary, I think the whole installation could be optimized and completed just with the following two lines:

pip install torch==1.13.0
pip install scgpt "flash-attn<1.0.5"
# As of 2023.09, pip install may not run with new versions of the google orbax package, if you encounter related issues, please use the following command instead:
# pip install scgpt "flash-attn<1.0.5" "orbax<0.1.8"

I've personally tested it and updated the README with this. So thank you very much for sharing your case, and it greatly helps optimize this. Feel free to try the above commands if you need a fresh installation at any time, and please let me know your feedback.

We are also working on updating to pytorch>2.0.1 recently, where flash-attn has been better integrated in pytorch itself. So we expect the installation will be even more simplified and hope to provide the new updates soon

mjstrumillo commented 1 year ago

3. I just realized some issue may be related to the order of the commands. For example, if you start with pip install flash_attn, it probably will select very new versions of flash_attn and pytorch>2.1, and then the later command of pip install scgpt will again change the versions and cause the incompatibility.

just a note to this - I did put a requirement "flash-attn<1.0.5" in every installation, including the original one. so I think it took the correct pytorch, but newer torchvision etc.

mjstrumillo commented 1 year ago

also, once I installed faiss, then entire installation is broken (Im trying to follow the https://github.com/bowang-lab/scGPT/blob/main/tutorials/Tutorial_Reference_Mapping.ipynb). once I installed faiss, and I can import it, the numba versioning just gives up

Python 3.10.11 (main, Apr 20 2023, 19:02:41) [GCC 11.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import faiss
>>> import scgpt
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/opt/conda/lib/python3.10/site-packages/scgpt/__init__.py", line 18, in <module>
    from . import model, tokenizer, scbank, utils, tasks
  File "/opt/conda/lib/python3.10/site-packages/scgpt/model/__init__.py", line 8, in <module>
    from .generation_model import *
  File "/opt/conda/lib/python3.10/site-packages/scgpt/model/generation_model.py", line 21, in <module>
    from ..utils import map_raw_id_to_vocab_id
  File "/opt/conda/lib/python3.10/site-packages/scgpt/utils/__init__.py", line 1, in <module>
    from .util import *
  File "/opt/conda/lib/python3.10/site-packages/scgpt/utils/util.py", line 14, in <module>
    import scib
  File "/opt/conda/lib/python3.10/site-packages/scib/__init__.py", line 8, in <module>
    from . import integration, metrics, preprocessing, utils
  File "/opt/conda/lib/python3.10/site-packages/scib/integration.py", line 7, in <module>
    import scanpy as sc
  File "/opt/conda/lib/python3.10/site-packages/scanpy/__init__.py", line 6, in <module>
    from ._utils import check_versions
  File "/opt/conda/lib/python3.10/site-packages/scanpy/_utils/__init__.py", line 28, in <module>
    from .compute.is_constant import is_constant
  File "/opt/conda/lib/python3.10/site-packages/scanpy/_utils/compute/is_constant.py", line 5, in <module>
    from numba import njit
  File "/opt/conda/lib/python3.10/site-packages/numba/__init__.py", line 42, in <module>
    from numba.np.ufunc import (vectorize, guvectorize, threading_layer,
  File "/opt/conda/lib/python3.10/site-packages/numba/np/ufunc/__init__.py", line 3, in <module>
    from numba.np.ufunc.decorators import Vectorize, GUVectorize, vectorize, guvectorize
  File "/opt/conda/lib/python3.10/site-packages/numba/np/ufunc/decorators.py", line 3, in <module>
    from numba.np.ufunc import _internal
SystemError: initialization of _internal failed without raising an exception
>>> exit()
root@0280ac1edd8d:/usr/src/app# pip show numba
Name: numba
Version: 0.55.2
Summary: compiling Python code using LLVM
Home-page: https://numba.pydata.org
Author: 
Author-email: 
License: BSD
Location: /opt/conda/lib/python3.10/site-packages
Requires: llvmlite, numpy, setuptools
Required-by: pynndescent, scanpy, scgpt, scib, umap-learn
root@0280ac1edd8d:/usr/src/app# 
subercui commented 1 year ago

Hi, do you mean you can still import the packages, and there is only some warning about the numba version? I would think the tutorial may still work if that's the case.

The numba is not used in tutorial applications.

Just to share the experience of installing faiss, I personally tried their conda-forge installation option and it worked fine.

mjstrumillo commented 1 year ago

yes, I installed via conda install -c pytorch -c nvidia faiss-gpu=1.7.4 mkl=2021 blas=1.0=mkl as that was the only option it worked (the conda forge did not compile a workign version). so I have a wokring faiss, I tried pip uninstalling scgpt, then pip install, but nothing changed. its the same numba version as before, which makes me question all the depended packages - technically only these are changing: The following NEW packages will be INSTALLED:

  cudatoolkit        nvidia/linux-64::cudatoolkit-11.4.1-h8ab8bb3_9 
  faiss-gpu          pytorch/linux-64::faiss-gpu-1.7.4-py3.10_hc0239a3_0_cuda11.4 
  libfaiss           pytorch/linux-64::libfaiss-1.7.4-h13c3c6d_0_cuda11.4 

The following packages will be UPDATED:

  ca-certificates                     2023.01.10-h06a4308_0 --> 2023.08.22-h06a4308_0 
  certifi                          2023.5.7-py310h06a4308_0 --> 2023.7.22-py310h06a4308_0 
  conda                              23.3.1-py310h06a4308_0 --> 23.7.4-py310h06a4308_0 
  openssl                                 1.1.1t-h7f8727e_0 --> 1.1.1w-h7f8727e_0 

The following packages will be DOWNGRADED:

  intel-openmp                      2023.1.0-hdb19cb5_46305 --> 2021.4.0-h06a4308_3561 
  mkl                               2023.1.0-h6d00ec8_46342 --> 2021.4.0-h06a4308_640 
  mkl-service                         2.4.0-py310h5eee18b_1 --> 2.4.0-py310h7f8727e_0 
  mkl_fft                             1.3.6-py310h1128e8f_1 --> 1.3.1-py310hd6ae3a3_0 
  mkl_random                          1.2.2-py310h1128e8f_1 --> 1.2.2-py310h00e6091_0 
  numpy                              1.24.3-py310h5f9d8c6_1 --> 1.24.3-py310hd5efca6_0 
  numpy-base                         1.24.3-py310hb5e798b_1 --> 1.24.3-py310h8e6c178_0 
subercui commented 1 year ago

Based on your updated comment, https://github.com/bowang-lab/scGPT/issues/58#issuecomment-1720267290, it looks like that is related to scanpy instead of faiss.

You can try in your python env to check if this is the cause for the issue

import scanpy

Which version of scanpy do you use? Can you try pip install scanpy==1.9.1, this is the version we tested with as indicated in https://github.com/bowang-lab/scGPT/blob/main/pyproject.toml . Usually, the pip should handle the dependencies and find compatible versions of scanpy with numba, but it looks like not the case in your installation.

mjstrumillo commented 1 year ago

so I can use scgpt when using scanpy 1.9.4 (that was somehow installed automatically in the process, I didnt manually install it). If with that I isntall faiss, the above issue raises. I downgraded to scanpy 1.9.1 and that fails importing both, scanpy and scgpt, with the following errors:

>>> import scanpy
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/opt/conda/lib/python3.10/site-packages/scanpy/__init__.py", line 16, in <module>
    from . import plotting as pl
  File "/opt/conda/lib/python3.10/site-packages/scanpy/plotting/__init__.py", line 1, in <module>
    from ._anndata import (
  File "/opt/conda/lib/python3.10/site-packages/scanpy/plotting/_anndata.py", line 28, in <module>
    from . import _utils
  File "/opt/conda/lib/python3.10/site-packages/scanpy/plotting/_utils.py", line 35, in <module>
    class _AxesSubplot(Axes, axes.SubplotBase, ABC):
TypeError: metaclass conflict: the metaclass of a derived class must be a (non-strict) subclass of the metaclasses of all its bases
>>> import scgpt
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/opt/conda/lib/python3.10/site-packages/scgpt/__init__.py", line 18, in <module>
    from . import model, tokenizer, scbank, utils, tasks
  File "/opt/conda/lib/python3.10/site-packages/scgpt/model/__init__.py", line 8, in <module>
    from .generation_model import *
  File "/opt/conda/lib/python3.10/site-packages/scgpt/model/generation_model.py", line 21, in <module>
    from ..utils import map_raw_id_to_vocab_id
  File "/opt/conda/lib/python3.10/site-packages/scgpt/utils/__init__.py", line 1, in <module>
    from .util import *
  File "/opt/conda/lib/python3.10/site-packages/scgpt/utils/util.py", line 14, in <module>
    import scib
  File "/opt/conda/lib/python3.10/site-packages/scib/__init__.py", line 8, in <module>
    from . import integration, metrics, preprocessing, utils
  File "/opt/conda/lib/python3.10/site-packages/scib/integration.py", line 7, in <module>
    import scanpy as sc
  File "/opt/conda/lib/python3.10/site-packages/scanpy/__init__.py", line 16, in <module>
    from . import plotting as pl
  File "/opt/conda/lib/python3.10/site-packages/scanpy/plotting/__init__.py", line 1, in <module>
    from ._anndata import (
  File "/opt/conda/lib/python3.10/site-packages/scanpy/plotting/_anndata.py", line 28, in <module>
    from . import _utils
  File "/opt/conda/lib/python3.10/site-packages/scanpy/plotting/_utils.py", line 35, in <module>
    class _AxesSubplot(Axes, axes.SubplotBase, ABC):
TypeError: metaclass conflict: the metaclass of a derived class must be a (non-strict) subclass of the metaclasses of all its bases
mjstrumillo commented 1 year ago

I checked all, scanpy 1.9.0, 1.9.1, 1.9.2, 1.9.3 and 1.9.4 - after installation of faiss, I cannot import any of these due to the above error, so neither can I use scgpt. But I can use faiss, so thats something ;)

can it come down to python version? Im on 3.10.11 , as required by (python = ">=3.7.13,<3.11")

subercui commented 1 year ago

yes, I installed via conda install -c pytorch -c nvidia faiss-gpu=1.7.4 mkl=2021 blas=1.0=mkl as that was the only option it worked (the conda forge did not compile a workign version). so I have a wokring faiss, I tried pip uninstalling scgpt, then pip install, but nothing changed. its the same numba version as before, which makes me question all the depended packages - technically only these are changing: The following NEW packages will be INSTALLED:

  cudatoolkit        nvidia/linux-64::cudatoolkit-11.4.1-h8ab8bb3_9 
  faiss-gpu          pytorch/linux-64::faiss-gpu-1.7.4-py3.10_hc0239a3_0_cuda11.4 
  libfaiss           pytorch/linux-64::libfaiss-1.7.4-h13c3c6d_0_cuda11.4 

The following packages will be UPDATED:

  ca-certificates                     2023.01.10-h06a4308_0 --> 2023.08.22-h06a4308_0 
  certifi                          2023.5.7-py310h06a4308_0 --> 2023.7.22-py310h06a4308_0 
  conda                              23.3.1-py310h06a4308_0 --> 23.7.4-py310h06a4308_0 
  openssl                                 1.1.1t-h7f8727e_0 --> 1.1.1w-h7f8727e_0 

The following packages will be DOWNGRADED:

  intel-openmp                      2023.1.0-hdb19cb5_46305 --> 2021.4.0-h06a4308_3561 
  mkl                               2023.1.0-h6d00ec8_46342 --> 2021.4.0-h06a4308_640 
  mkl-service                         2.4.0-py310h5eee18b_1 --> 2.4.0-py310h7f8727e_0 
  mkl_fft                             1.3.6-py310h1128e8f_1 --> 1.3.1-py310hd6ae3a3_0 
  mkl_random                          1.2.2-py310h1128e8f_1 --> 1.2.2-py310h00e6091_0 
  numpy                              1.24.3-py310h5f9d8c6_1 --> 1.24.3-py310hd5efca6_0 
  numpy-base                         1.24.3-py310hb5e798b_1 --> 1.24.3-py310h8e6c178_0 

If you can import successfully previously, but not after these changes. I think the changes to mkl and numpy caused errors with numba or scanpy.

So I have the following option in mind,

you probably want to relax the requirement of faiss installation, I think even if the conda forge didn't work, the faiss-gpu can be installed via

conda install -c pytorch faiss-gpu

This way you give conda more freedom to choose the versions.

Also, I noticed the cudatookit changed after the installation? It may influence the running flash-attn. Actually, if you only need to run the reference mapping for cells less than tens of thousands, like in the tutorial notebook, then the faiss-cpu can be fast enough

mjstrumillo commented 1 year ago

the conda install -c pytorch faiss-gpu faills due to old python :

root@fdc8cbb72558:/usr/src/app# conda install -c pytorch faiss-gpu
Collecting package metadata (current_repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Solving environment: failed with repodata from current_repodata.json, will retry with next repodata source.
Collecting package metadata (repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Solving environment: / 
Found conflicts! Looking for incompatible packages.
This can take several minutes.  Press CTRL-C to abort.
failed                                                                                                                                                                                                                         

UnsatisfiableError: The following specifications were found
to be incompatible with the existing python installation in your environment:

Specifications:

  - faiss-gpu -> python[version='>=3.11,<3.12.0a0']

Your python: python=3.10

If python is on the left-most side of the chain, that's the version you've asked for.
When python appears to the right, that indicates that the thing on the left is somehow
not available for the python version you are constrained to. Note that conda will not
change your python version to a different minor version unless you explicitly specify
that.

The following specifications were found to be incompatible with your system:

  - feature:/linux-64::__glibc==2.31=0
  - feature:|@/linux-64::__glibc==2.31=0
  - faiss-gpu -> __glibc[version='>=2.17,<3.0.a0']
  - faiss-gpu -> libgcc-ng[version='>=11.2.0'] -> __glibc[version='>=2.17']

Your installed version is: 2.31

Im so sorry about this :D

mjstrumillo commented 1 year ago

so I guess it comes down to finding a version of faiss, that will be not new too much for python 3.10 and then not too old for the scanpy?

subercui commented 1 year ago

I see. I will try to test further with python 3.10 and let you know. I think there should be a solution probably with a fresh installation

mjstrumillo commented 1 year ago

I installed faiss-cpu both form conda-forge and then pip install just to try. None of these solved the aforementioned issue and come back to that numba blabber. I will try to build from source manually tomorrow?

mjstrumillo commented 1 year ago

so I finally got to the bottom, that faiss has the python requirement - faiss-gpu=1.7.4 -> python[version='>=3.6,<3.7.0a0|>=3.7,<3.8.0a0|>=3.8,<3.9.0a0'] so since Im using python 3.10... I tried 5 older faiss verisons, and they all have the 3.10 contraint...Im starting a new docker with python 3.8

mjstrumillo commented 1 year ago

here's the only combination that worked for me: ubuntu 20 python 3.8.18 cuda 11.4 [devel: nvidia/cuda:11.4.3-devel-ubuntu20.04] faiss 1.7.4. [conda install -c conda-forge faiss-gpu]

requirements.txt: packaging wandb gseapy

maybe that will save someone days of playing with the combinations - but the bottleneck was faiss.