Closed lixiangMindSpore closed 3 years ago
Thanks for opening the issue. Bagua cannot find NCCL installation on your system in this case. Have you tried to follow the error message's instruction by running import bagua_core; bagua_core.install_deps()
in your Python interpreter? It will help install needed system libraries.
Thanks for opening the issue. Bagua cannot find NCCL installation on your system in this case. Have you tried to follow the error message's instruction by running
import bagua_core; bagua_core.install_deps()
in your Python interpreter? It will help install needed system libraries.
I run bagua_install_deps.py and solve the problem. Thank you so much!
You're welcome :)
Python 3.8.0 (default, Feb 25 2021, 22:10:10)
Type 'copyright', 'credits' or 'license' for more information
IPython 7.22.0 -- An enhanced Interactive Python. Type '?' for help.
In [1]: import bagua
---------------------------------------------------------------------------
ImportError Traceback (most recent call last)
<ipython-input-1-b6bb5bf6d045> in <module>
----> 1 import bagua
~/python38/lib/python3.8/site-packages/bagua/__init__.py in <module>
10 """
11
---> 12 import bagua_core # noqa: F401
13 from .version import __version__ # noqa: F401
~/python38/lib/python3.8/site-packages/bagua_core/__init__.py in <module>
2
3 _environment._preload_libraries()
----> 4 from .bagua_core import * # noqa: F401,E402,F403
5 from .bagua_install_deps import install_deps # noqa: F401,E402,F403
ImportError: libnccl.so.2: cannot open shared object file: No such file or directory
I got the same error with bagua-cuda116 using virtualenv. running bagua_install_deps.py failed for me.
bagua_install_deps.py
import-im6.q16: not authorized `os' @ error/constitute.c/WriteImage/1037.
import-im6.q16: not authorized `platform' @ error/constitute.c/WriteImage/1037.
import-im6.q16: not authorized `shutil' @ error/constitute.c/WriteImage/1037.
import-im6.q16: not authorized `tempfile' @ error/constitute.c/WriteImage/1037.
import-im6.q16: not authorized `pathlib' @ error/constitute.c/WriteImage/1037.
from: too many arguments
/home/xxx/python38/bin/bagua_install_deps.py: line 10: _nccl_records: command not found
/home/xxx/python38/bin/bagua_install_deps.py: line 11: library_records: command not found
/home/xxx/python38/bin/bagua_install_deps.py: line 14: syntax error near unexpected token `('
/home/xxx/python38/bin/bagua_install_deps.py: line 14: `class DownloadProgressBar(tqdm):'
bagua-cuda116 was built differently with other cuda release.
bagua-cuda116 0.8.3.dev215
@Godricly Which python version did you use to run bagua_install_deps.py
?
Maybe you can try: python3 bagua_install_deps.py
?
I tried on an other machine with cuda113 and nccl, which works well for me. I think the problem is that nccl is not installed. Also that bagua-cuda116 version should be updated.
Describe the bug A clear and concise description of what the bug is.
Environment
python3 -m pip install git+https://github.com/BaguaSys/bagua.git -f https://repo.arrayfire.com/python/wheels/3.8.0/
)?:I use 0.8.1.post1Reproducing
Please provide a minimal working example. This means the runnable code.
Please also write what exact commands are required to reproduce your results.
Additional context Add any other context about the problem here.