Open Fonsifa opened 1 year ago
this bug caused by wrong version of libnccl i solved it by reinstalling a right ver libnccl and recreating a new python env based on this libnccl
this bug caused by wrong version of libnccl i solved it by reinstalling a right ver libnccl and recreating a new python env based on this libnccl
may i ask your concrete version of python and libnccl, thx
yeah python == 3.8.13 gcc == 7.5.0 nccl == libnccl.so.2.8.4
Hi, I am running into the same issue when building from source. I don't understand how libnccl version affects the filenotfound error? Any other solution to this?
Hi, I am running into the same issue when building from source. I don't understand how libnccl version affects the filenotfound error? Any other solution to this?
the mirror url is write in some workplace file. it seems the file not found
problem not the error reason. the incorrect libnccl version is the main cause.
Please describe the bug
Please describe the expected behavior
System information and environment
To Reproduce Steps to reproduce the behavior: When I try to install alpa from source, and execute
python3 build/build.py --enable_cuda --dev_install --bazel_options=--override_repository=org_tensorflow=$(pwd)/../third_party/tensorflow-alpa
, some warnings happened. And I don't know if it's related to the error happened in the second pic.Screenshots If applicable, add screenshots to help explain your problem.
Code snippet to reproduce the problem
Additional information Add any other context about the problem here or include any logs that would be helpful to diagnose the problem.