Closed lkzs closed 2 years ago
Hi,
Thanks for trying out our package. It seems that it is related to NCCL installation. Please let us know what is installed NCCL version and the corresponding installation path. You can refer to the official NCCL guide for the installation. The suggested NCCL version should be above 2.7.
In addition, please let us know what are the steps you follow to install BlueFog. We recommend the steps in this page.
The nccl version is 2.12 , because I don't have root,I install nccl by conda. I have tried install nccl by github source code , but ' make -j12 src.build BUILDDIR=/home/chenz/software/nccl CUDA_HOME=/usr/local/cuda-10.2 NVCC_GENCODE="-gencode=arch=compute_35,code=sm_35" ' command error Thans for reply.
In this case, you have add BLUEFOG_NCCL_HOME=<the path you installed>
before the pip install
to tell BlueFog where to find the NCCL library. Please try it. If it still doesn't work, let's know
I installed nccl by command 'conda install nccl' ,but don't have the nccl_include or nccl_lib file .Maybe this way is wrong. I don't have the root , how can I install nccl and bluefog?
I installed nccl by command 'conda install nccl' ,but don't have the nccl_include or nccl_lib file .Maybe this way is wrong. I don't have the root , how can I install nccl and bluefog? This problem have solved . I have successfully installed bluefog-0.3.0 ,by command BLUEFOG_NCCL_HOME=/home/zhenfeng/software/nccl BLUEFOG_WITH_NCCL=1 pip install --no-cache-dir bluefog . But when I run [bluefog-tutorial] "Applying BlueFog on Deep Learning problem(High Level API Introduction).ipynb" by command : 'ibfrun start -np 4 ', some error happend when run the 'Start decentralized trainning' cell . The error follows :
My environment is Ubuntu-18.04 Nccl-2.12 Openmpi-4.0.7 Bluefog-0.3.0 Why does this error happens? Is because version of Nccl inappropriate?Can anyone help me ? Thanks very much !
Same comment in #108. Discussed offline, it is more likely the CUDA and NCCL installation issue instead of BlueFog's. Feel free to re-open if necessary.
During handling of the above exception, another exception occurred:
note: This error originates from a subprocess, and is likely not a problem with pip. error: legacy-install-failure
?Encountered error while trying to install package. \u2570\u2500> bluefog
note: This is an issue with the package mentioned above, not pip. hint: See above for output from the failure.