alpa-projects / alpa

Training and serving large-scale neural networks with auto parallelization.
https://alpa.ai
Apache License 2.0
3.08k stars 357 forks source link

cupy package mismatches with CUDA version in the docs #950

Open serach24 opened 1 year ago

serach24 commented 1 year ago

Please describe the bug Hi, according to the alpa installation doc, we need to pip3 install cupy-cuda11x to install cupy. However, when CUDA version is 11.1, according to cupy package info here and here, the correct command should be pip3 install cupy-cuda111, as using cupy-cuda11x will result in errors in the next step (python3 -c "from cupy.cuda import nccl").

Please describe the expected behavior python3 -c "from cupy.cuda import nccl" does not print any error.

System information and environment

To Reproduce Steps to reproduce the behavior: Follow the installation guide using CUDA 11.1 and cudnn 8.0.5.

WelY1 commented 1 year ago

Hi, I pip install cupy-cuda11x, and pip3 install alpa and jaxlib. But when I check the installation by python3 -m alpa.test_install, An error occurred:

ERROR: test_2_pipeline_parallel (__main__.InstallationTest)
-------------------------------------------------------------------
......
self._sender_tasks[sender_worker].append(
jax._src.traceback_util.UnfilteredStackTrace: KeyError: Actor(MeshHostWorker, d022bdfe8bc5c769e2ce7fa6302000000)

The stack trace below excludes JAX-internal frames.
The preceding is the original exception that occurred, unmodified.

Have you ever encountered this problem? Thanks!

serach24 commented 1 year ago

Hi, I pip install cupy-cuda11x, and pip3 install alpa and jaxlib. But when I check the installation by python3 -m alpa.test_install, An error occurred:

ERROR: test_2_pipeline_parallel (__main__.InstallationTest)
-------------------------------------------------------------------
......
self._sender_tasks[sender_worker].append(
jax._src.traceback_util.UnfilteredStackTrace: KeyError: Actor(MeshHostWorker, d022bdfe8bc5c769e2ce7fa6302000000)

The stack trace below excludes JAX-internal frames.
The preceding is the original exception that occurred, unmodified.

Have you ever encountered this problem? Thanks!

I was not able to pass installation test 2 all the time due to different errors with various versions. You can file a new issue about the specific error you met, but I feel the project is not actively maintained.