aqlaboratory / openfold

Trainable, memory-efficient, and GPU-friendly PyTorch reproduction of AlphaFold 2
Apache License 2.0
2.75k stars 523 forks source link

Unspecified Numpy version causing installation issue #477

Open RJ3 opened 1 month ago

RJ3 commented 1 month ago

Using the environment.yml with unspecified numpy version causes it to pull in Numpy 2.1.0 as of today's date. This causes an issue with the specified version of deepspeed.

Specifying Numpy to version 1.26 appears to get past the installation error, but I cannot confirm any other regressions.

https://github.com/aqlaboratory/openfold/blob/3bec3e9b2d1e8bdb83887899102eff7d42dc2ba9/environment.yml#L16


  Using cached deepspeed-0.12.4.tar.gz (1.2 MB)
  Preparing metadata (setup.py) ... error
  error: subprocess-exited-with-error

  × python setup.py egg_info did not run successfully.
  │ exit code: 1
  ╰─> [34 lines of output]

      A module that was compiled using NumPy 1.x cannot be run in
      NumPy 2.1.0 as it may crash. To support both 1.x and 2.x
      versions of NumPy, modules must be compiled with NumPy 2.0.
      Some module may need to rebuild instead e.g. with 'pybind11>=2.12'.

      If you are a user of the module, the easiest solution will be to
      downgrade to 'numpy<2' or try to upgrade the affected module.
      We expect that some modules will need time to support NumPy 2.

      Traceback (most recent call last):  File "<string>", line 2, in <module>
        File "<pip-setuptools-caller>", line 34, in <module>
        File "/tmp/pip-install-ukaj6i9k/deepspeed_2c7096c05c1647b692f817716c6fb3f3/setup.py", line 31, in <module>
          import torch
        File "/home/ra29435/micromamba/envs/openfold-pl/lib/python3.10/site-packages/torch/__init__.py", line 1382, in <module>
          from .functional import *  # noqa: F403
        File "/home/ra29435/micromamba/envs/openfold-pl/lib/python3.10/site-packages/torch/functional.py", line 7, in <module>
          import torch.nn.functional as F
        File "/home/ra29435/micromamba/envs/openfold-pl/lib/python3.10/site-packages/torch/nn/__init__.py", line 1, in <module>
          from .modules import *  # noqa: F403
        File "/home/ra29435/micromamba/envs/openfold-pl/lib/python3.10/site-packages/torch/nn/modules/__init__.py", line 35, in <module>
          from .transformer import TransformerEncoder, TransformerDecoder, \
        File "/home/ra29435/micromamba/envs/openfold-pl/lib/python3.10/site-packages/torch/nn/modules/transformer.py", line 20, in <module>
          device: torch.device = torch.device(torch._C._get_default_device()),  # torch.device('cpu'),
      /home/ra29435/micromamba/envs/openfold-pl/lib/python3.10/site-packages/torch/nn/modules/transformer.py:20: UserWarning: Failed to initialize NumPy: _ARRAY_API not found (Triggered internally at /opt/conda/conda-bld/pytorch_1702400410390/work/torch/csrc/utils/tensor_numpy.cpp:84.)
        device: torch.device = torch.device(torch._C._get_default_device()),  # torch.device('cpu'),
      Traceback (most recent call last):
        File "<string>", line 2, in <module>
        File "<pip-setuptools-caller>", line 34, in <module>
        File "/tmp/pip-install-ukaj6i9k/deepspeed_2c7096c05c1647b692f817716c6fb3f3/setup.py", line 100, in <module>
          cuda_major_ver, cuda_minor_ver = installed_cuda_version()
        File "/tmp/pip-install-ukaj6i9k/deepspeed_2c7096c05c1647b692f817716c6fb3f3/op_builder/builder.py", line 50, in installed_cuda_version
          raise MissingCUDAException("CUDA_HOME does not exist, unable to compile CUDA op(s)")
      op_builder.builder.MissingCUDAException: CUDA_HOME does not exist, unable to compile CUDA op(s)
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.
critical libmamba pip failed to install packages```
ahmedselim2017 commented 2 weeks ago

I have been facing the same problem too, did you see any effects of specifying the Numpy version on training/inference?