dmlc / dgl

Python package built to ease deep learning on graph, on top of existing DL frameworks.
http://dgl.ai
Apache License 2.0
13.51k stars 3.02k forks source link

PowerPC (Power9) Source Compilation #2661

Closed amorehead closed 3 years ago

amorehead commented 3 years ago

❓ PowerPC (Power9) Source Compilation

Hello. I have recently been trying to compile DGL from source on a Power9 (PowerPC) Linux-based cluster, and I am not having much luck. The steps I have taken to try to compile it from source as follows:

  1. (Referencing https://docs.dgl.ai/en/latest/install/index.html) git clone https://github.com/dmlc/dgl.git
  2. git submodule update --init --recursive
  3. cd dgl
  4. mkdir build
  5. cp cmake/config.make build/
  6. cd build
  7. (Cluster uses modules for loading low-level packages - run outside of a Conda/pip environment) module load cuda/11.2.0
  8. (Cluster uses modules for loading low-level packages - run outside of a Conda/pip environment) module load metis/5.1.0
  9. (Cluster uses modules for loading low-level packages - run outside of a Conda/pip environment) module load cmake/3.18.2
  10. cmake -DUSE_CUDA=ON ..

Running step 8 results in: -- The C compiler identification is XLClang 16.1.1.5 -- The CXX compiler identification is XLClang 16.1.1.5 -- Detecting C compiler ABI info -- Detecting C compiler ABI info - done -- Check for working C compiler: /sw/summit/xl/16.1.1-5/xlC/16.1.1/bin/cc - skipped -- Detecting C compile features -- Detecting C compile features - done -- Detecting CXX compiler ABI info -- Detecting CXX compiler ABI info - done -- Check for working CXX compiler: /sw/summit/xl/16.1.1-5/xlC/16.1.1/bin/xlC - skipped -- Detecting CXX compile features -- Detecting CXX compile features - done -- Start configuring project dgl -- Performing Test SUPPORT_CXX11 -- Performing Test SUPPORT_CXX11 - Success -- Found OpenMP_C: -qsmp=omp (found version "4.5") -- Found OpenMP_CXX: -qsmp=omp (found version "4.5") -- Found OpenMP: TRUE (found version "4.5")
-- Build with OpenMP. -- Build with AVX optimization. CMake Warning (dev) at third_party/dmlc-core/cmake/Utils.cmake:196 (option): Policy CMP0077 is not set: option() honors normal variables. Run "cmake --help-policy CMP0077" for policy details. Use the cmake_policy command to set the policy and suppress this warning.

For compatibility with older versions of CMake, option is clearing the normal variable 'USE_OPENMP'. Call Stack (most recent call first): third_party/dmlc-core/CMakeLists.txt:20 (dmlccore_option) This warning is for project developers. Use -Wno-dev to suppress it.

-- Found OpenMP_C: -qsmp=omp (found version "4.5") -- Found OpenMP_CXX: -qsmp=omp (found version "4.5") -- Looking for clock_gettime in rt -- Looking for clock_gettime in rt - found -- Looking for fopen64 -- Looking for fopen64 - not found -- Looking for C++ include cxxabi.h -- Looking for C++ include cxxabi.h - found -- Looking for nanosleep -- Looking for nanosleep - found -- Looking for backtrace -- Looking for backtrace - found -- backtrace facility detected in default set of libraries -- Found Backtrace: /usr/include
-- Check if the system is big endian -- Searching 16 bit integer -- Looking for sys/types.h -- Looking for sys/types.h - found -- Looking for stdint.h -- Looking for stdint.h - found -- Looking for stddef.h -- Looking for stddef.h - found -- Check size of unsigned short -- Check size of unsigned short - done -- Searching 16 bit integer - Using unsigned short -- Check if the system is big endian - little endian -- /gpfs/alpine/scratch/acmwhb/bip198/Repositories/Lab_Repositories/dgl/third_party/dmlc-core/cmake/build_config.h.in -> include/dmlc/build_config.h -- Performing Test SUPPORT_MSSE2 -- Performing Test SUPPORT_MSSE2 - Failed -- Looking for execinfo.h -- Looking for execinfo.h - found -- Looking for getline -- Looking for getline - found -- Configuring done -- Generating done -- Build files have been written to: /gpfs/alpine/scratch/acmwhb/bip198/Repositories/Lab_Repositories/dgl/build

It looks like the build script fails when testing for MSSE2 support, and I am not sure where to go from here. I would appreciate any advice you might have to offer!

amorehead commented 3 years ago

I should also mention that following up the cmake command listed above with the command "make -j4" generates the errors below:

... Build steps leading up to 39% ... (default at OPT(3)) has the potential to alter the semantics of a program. Please refer to documentation on the STRICT/NOSTRICT option for more information. [ 39%] Linking CXX static library libdmlc.a [ 39%] Built target dmlc make: *** [all] Error 2

fxd24 commented 3 years ago

Hi @amorehead I am encountering a similar problem in the build process, but I am using a different compiler, namely GCC 7.3. In my case at the beginning I get the same as you with SSE2 test failing because it's a characteristic of Intel x86 architecture. In the DGL git repository there is an include folder that contains Intel specific files ans I am wondering if it's even possible to just build the dgl library on PowerPC. Maybe some source code has to be adapted to make it work.. did you find a solution in the meantime?

amorehead commented 3 years ago

Hi, @fxd24 . I have not been able to get this to build yet, no. I am hoping someone else who has figured this out before will notice this issue before too long.

VoVAllen commented 3 years ago

You can change the option USE_AVX to OFF when build the code, by cmake -DUSE_CUDA=ON -DUSE_AVX=OFF .. or change the option inside cmake/config.cmake

amorehead commented 3 years ago

@VoVAllen After running cmake in the "build" directory with the command you suggested above (i.e. cmake -DUSE_CUDA=ON -DUSE_AVX=OFF ..), I was presented with the following error after running its corresponding make command (i.e. make -j4):

[ 37%] Building C object third_party/METIS/libmetis/CMakeFiles/metis.dir/timing.c.o [ 37%] Building C object third_party/METIS/libmetis/CMakeFiles/metis.dir/util.c.o [ 38%] Building C object third_party/METIS/libmetis/CMakeFiles/metis.dir/wspace.c.o [ 39%] Linking C static library libmetis.a [ 39%] Built target metis 1500-036: (I) The NOSTRICT option (default at OPT(3)) has the potential to alter the semantics of a program. Please refer to documentation on the STRICT/NOSTRICT option for more information. [ 39%] Linking CXX static library libdmlc.a [ 39%] Built target dmlc make: *** [all] Error 2

amorehead commented 3 years ago

After running the same make command above with VERBOSE=1, I see the following errors:

make[2]: Leaving directory `/gpfs/alpine/bip198/scratch/acmwhb/Repositories/Lab_Repositories/dgl/build' Re-run cmake no build system arguments [ 39%] Built target metis -- The C compiler identification is XLClang 16.1.1.5 -- The CXX compiler identification is XLClang 16.1.1.5 -- Detecting C compiler ABI info -- Detecting C compiler ABI info - done -- Check for working C compiler: /sw/summit/xl/16.1.1-5/xlC/16.1.1/bin/cc - skipped -- Detecting C compile features -- Detecting C compile features - done -- Detecting CXX compiler ABI info -- Detecting CXX compiler ABI info - done -- Check for working CXX compiler: /sw/summit/xl/16.1.1-5/xlC/16.1.1/bin/xlC - skipped -- Detecting CXX compile features -- Detecting CXX compile features - done -- Using Python interpreter: python Traceback (most recent call last): File "/gpfs/alpine/scratch/acmwhb/bip198/Repositories/Lab_Repositories/dgl/tensoradapter/pytorch/find_cmake.py", line 1, in import torch ImportError: No module named torch -- find_cmake.py output: CMake Error at CMakeLists.txt:16 (list): list GET given empty list

CMake Error at CMakeLists.txt:17 (list): list GET given empty list

-- Configuring for PyTorch -- Setting directory to /Torch CMake Error at CMakeLists.txt:22 (find_package): By not providing "FindTorch.cmake" in CMAKE_MODULE_PATH this project has asked CMake to find a package configuration file provided by "Torch", but CMake did not find one.

Could not find a package configuration file provided by "Torch" with any of the following names:

TorchConfig.cmake
torch-config.cmake

Add the installation prefix of "Torch" to CMAKE_PREFIX_PATH or set "Torch_DIR" to a directory containing one of the above files. If "Torch" provides a separate development package or SDK, be sure it has been installed.

-- Configuring incomplete, errors occurred! See also "/gpfs/alpine/scratch/acmwhb/bip198/Repositories/Lab_Repositories/dgl/tensoradapter/pytorch/build/CMakeFiles/CMakeOutput.log". make[2]: [CMakeFiles/tensoradapter_pytorch] Error 1 make[2]: Leaving directory `/gpfs/alpine/bip198/scratch/acmwhb/Repositories/Lab_Repositories/dgl/build' make[1]: [CMakeFiles/tensoradapter_pytorch.dir/all] Error 2 make[1]: Leaving directory `/gpfs/alpine/bip198/scratch/acmwhb/Repositories/Lab_Repositories/dgl/build' make: *** [all] Error 2

amorehead commented 3 years ago

I believe the reason for the above error is that I am not compiling DGL in a Conda environment. By default, only Python 2 is installed in the default environment (i.e. outside of any Conda or pip environment - that is, globally).

VoVAllen commented 3 years ago

Torch is optional, which can accelerate memory allocation in DGL. To build without torch, you can try cmake -DBUILD_TORCH=OFF -DUSE_CUDA=ON -DUSE_AVX=OFF ..

fxd24 commented 3 years ago

Thanks @VoVAllen ! The AVX flag did it! Here are my steps @amorehead :

Installing DGL from source for PowerPC 64bit

Before you begin create a conda environment: conda create -n ENV_NAME python=3.7 Installing DGL requires a few dependencies that may cause some overhead steps in the installation process. We require gcc compiler that is version >= 5.x.x. To install a newer gcc compiler withing the Conda Environment type in the following:

conda install cudatoolkit-dev gxx_linux-ppc64le=7

Then you clone the git repo of dgl. See https://docs.dgl.ai/install/index.html. (I didn't use the config.cmake) Now after the cmake files are created we are not ready to install it through make because we have to point the compiler to the newly installed one by changing the following in the build folder CMakeCache.txt file (Note: the file changes back to the default one after executing cmake again):

//CMAKE_CXX_COMPILER:FILEPATH=/usr/bin/c++
CMAKE_CXX_COMPILER:FILEPATH=.conda/envs/<ENV_NAME>/bin/powerpc64le-conda_cos7-linux-gnu-c++
...
//C compiler
CMAKE_C_COMPILER:FILEPATH=.conda/envs/<ENV_NAME>/bin/powerpc64le-conda_cos7-linux-gnu-cc

We also have to disable AVX optimization as they only work on x86 architecture if no emulation is used or some kind of mapping. Therefore, set OFF in the following option.

//Build with AVX optimization
USE_AVX:STRING=OFF 

Note that the filepath may be different in your cluster. Then we have another problem caused by the compilation process using -march=native which is not supported on PowerPC and has to switched to: -mcpu=native. Therefore we have to change the flags for each of the files causing the problem:

Finally, you can type make -j4

amorehead commented 3 years ago

Thank you for sharing, @fxd24 ! Since I do not have the permissions on the cluster I am using to install Cuda in a Conda environment, I have been running the instructions from https://docs.dgl.ai/install/index.html with the modifications I've listed above. This time, I also tried editing the METIS package's flags.make file you mentioned to have the "-MCPU=NATIVE" flag appended to the "C_FLAGS = ..." variable, and I am still encountering the following errors in building METIS:

[ 39%] Built target metis -- The C compiler identification is XLClang 16.1.1.5 -- The CXX compiler identification is XLClang 16.1.1.5 -- Detecting C compiler ABI info -- Detecting C compiler ABI info - done -- Check for working C compiler: /sw/summit/xl/16.1.1-5/xlC/16.1.1/bin/cc - skipped -- Detecting C compile features -- Detecting C compile features - done -- Detecting CXX compiler ABI info -- Detecting CXX compiler ABI info - done -- Check for working CXX compiler: /sw/summit/xl/16.1.1-5/xlC/16.1.1/bin/xlC - skipped -- Detecting CXX compile features -- Detecting CXX compile features - done -- Using Python interpreter: python -- find_cmake.py output: /gpfs/alpine/bip198/scratch/acmwhb/Repositories/Lab_Repositories/RGSET/venv/lib/python3.6/site-packages/torch/share/cmake;1.6.0a0 -- Configuring for PyTorch 1.6.0a0 -- Setting directory to /gpfs/alpine/bip198/scratch/acmwhb/Repositories/Lab_Repositories/RGSET/venv/lib/python3.6/site-packages/torch/share/cmake/Torch -- Looking for pthread.h -- Looking for pthread.h - found -- Performing Test CMAKE_HAVE_LIBC_PTHREAD -- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success -- Found Threads: TRUE
CMake Warning at /gpfs/alpine/scratch/acmwhb/bip198/Repositories/Lab_Repositories/RGSET/venv/lib/python3.6/site-packages/torch/share/cmake/Caffe2/public/protobuf.cmake:88 (message): Protobuf cannot be found. Depending on whether you are building Caffe2 or a Caffe2 dependent library, the next warning / error will give you more info. Call Stack (most recent call first): /gpfs/alpine/scratch/acmwhb/bip198/Repositories/Lab_Repositories/RGSET/venv/lib/python3.6/site-packages/torch/share/cmake/Caffe2/Caffe2Config.cmake:56 (include) /gpfs/alpine/scratch/acmwhb/bip198/Repositories/Lab_Repositories/RGSET/venv/lib/python3.6/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:40 (find_package) CMakeLists.txt:22 (find_package)

CMake Error at /gpfs/alpine/scratch/acmwhb/bip198/Repositories/Lab_Repositories/RGSET/venv/lib/python3.6/site-packages/torch/share/cmake/Caffe2/Caffe2Config.cmake:58 (message): Your installed Caffe2 version uses protobuf but the protobuf library cannot be found. Did you accidentally remove it, or have you set the right CMAKE_PREFIX_PATH? If you do not have protobuf, you will need to install protobuf and set the library path accordingly. Call Stack (most recent call first): /gpfs/alpine/scratch/acmwhb/bip198/Repositories/Lab_Repositories/RGSET/venv/lib/python3.6/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:40 (find_package) CMakeLists.txt:22 (find_package)

VoVAllen commented 3 years ago

@amorehead You can add -DBUILD_TORCH=OFF to your build options, which will skip the Torch/Caffe checking. Or you can specify the conda python path which contains torch by add -DPYTHON_INTERP=#Your conda python path

amorehead commented 3 years ago

Through a combination of collaborative efforts, I was finally able to get DGL compiled properly on my PowerPC cluster! Thank you all for your help and guidance. Here are the exact steps I followed to get it installed as a Python dependency in a Conda environment:

(On my cluster's login node - with no Conda/venv environments activated)

  1. module load open-ce/1.1.3-py38-0 (for distributed deep learning support on Summit)
  2. conda activate $MY_CONDA_ENV (the name of my Conda environment)
  3. module load cmake/3.20.2
  4. module load gcc/9.3.0
  5. module load cuda/10.2.89
  6. git clone --depth 1 --branch 0.6.x https://github.com/dmlc/dgl.git (for installing DGL 0.6) or git clone https://github.com/dmlc/dgl.git (for installing the latest release of DGL)
  7. cd dgl/
  8. git submodule update --init --recursive
  9. mkdir build
  10. cd build/
  11. export CUDA_HOME=/sw/summit/cuda/10.2.89
  12. export CMAKE_C_COMPILER=/sw/summit/gcc/9.3.0-2/bin/gcc
  13. export CMAKE_CXX_COMPILER=/sw/summit/gcc/9.3.0-2/bin/g++
  14. export USE_LIBXSMM=OFF
  15. cmake -DUSE_AVX=OFF -DUSE_CUDA=ON -DCMAKE_C_COMPILER=/sw/summit/gcc/9.3.0-2/bin/gcc -DCMAKE_CXX_COMPILER=/sw/summit/gcc/9.3.0-2/bin/g++ -DUSE_LIBXSMM=OFF ..
  16. vim third_party/METIS/libmetis/CMakeFiles/metis.dir/flags.make (to replace '-march=native' with '-mcpu=native', followed by writing changes to storage and exiting file)
  17. make -j4
  18. cd ../python
  19. conda activate $MY_CONDA_ENV (could activate or not activate any Python/Conda environment in which to install (or not install) Python DGL bindings)
  20. python3 setup.py install (DGL should now be installed in your chosen Conda environment as a local pip package)
amorehead commented 3 years ago

Hey, everyone. I am encountering a new error when I try to run "make -j4" for DGL 0.7 on a Power9 (PowerPC) architecture (e.g., ORNL's Summit system). In the DGL Slack channel, @BarclayII suggested I use the USE_LIBXSMM=OFF flag for CMake to ignore LIBXSMM since it does not (currently) support the PowerPC architecture. I have updated my commands above to reflect this approach.

Screenshot from 2021-08-23 15-14-34

However, even while I can get around the above error with the USE_LIBXSMM=OFF flag, I am now encountering another error when I run a Python script that simply imports DGL.

Screenshot from 2021-08-23 16-03-18

Any ideas as to what's missing in my build script to get this C library showing up on my path?

VoVAllen commented 3 years ago

@amorehead This seems not related DGL. This error usually means you built on a system with higher glibc version and run it on a machine with lower glibc version. Could you provide more details about your build environment?

amorehead commented 3 years ago

@VoVAllen, This error seems strange, because I am building DGL on Summit (a Power9/PowerPC GPU server) and then immediately going to test it in a Python script on Summit (same environment as the build environment, with the same HPC modules loaded). I am building DGL exactly as I have outlined above, and once the Python bindings are installed in my local Conda environment, I go to test DGL in a Python script and am greeted with the above "version GLIBCXX_3.4.26 not found" error.

To see which version of GLIB is available on Summit, I ran "module spider GLIB" to see these results. Screenshot from 2021-08-25 10-33-47

It looks like the platform only has version 2.66.2 of GLIB available to users. Do you know if version 3 of GLIB became the default in versions 0.7 and 0.8 of DGL? I had DGL working just fine on Summit with version 0.6.

VoVAllen commented 3 years ago

Could you try ldd libdgl.so to see the libc.so? Did you have any conda environment? Because conda might change the RPATH of the dynamic library and other environment variable. Could you try compile and run in the conda env?

amorehead commented 3 years ago

@VoVAllen, the results of my "ldd libdgl.so" are as follows:

(DeepInteract)[acmwhb@login1.summit DeepInteract]$ ldd /ccs/home/acmwhb/.conda/envs/DeepInteract/lib/python3.8/site-packages/dgl-0.7.0-py3.8-linux-ppc64le.egg/dgl/libdgl.so
    linux-vdso64.so.1 (0x00007fffb9510000)
    libdl.so.2 => /lib64/libdl.so.2 (0x00007fffb4690000)
    librt.so.1 => /lib64/power9/librt.so.1 (0x00007fffb4660000)
    libcublas.so.11 => /sw/summit/cuda/11.0.3/lib64/libcublas.so.11 (0x00007fffae840000)
    libcusparse.so.11 => /sw/summit/cuda/11.0.3/lib64/libcusparse.so.11 (0x00007fffa4f40000)
    libcurand.so.10 => /sw/summit/cuda/11.0.3/lib64/libcurand.so.10 (0x00007fffa05a0000)
    libpthread.so.0 => /lib64/power9/libpthread.so.0 (0x00007fffa0550000)
    libgomp.so.1 => /sw/summit/gcc/9.3.0-2/lib64/libgomp.so.1 (0x00007fffa04e0000)
    libstdc++.so.6 => /sw/summit/gcc/9.3.0-2/lib64/libstdc++.so.6 (0x00007fffa0250000)
    libm.so.6 => /lib64/power9/libm.so.6 (0x00007fffa0120000)
    libgcc_s.so.1 => /sw/summit/gcc/9.3.0-2/lib64/libgcc_s.so.1 (0x00007fffa00e0000)
    libc.so.6 => /lib64/power9/libc.so.6 (0x00007fff9fed0000)
    /lib64/ld64.so.2 (0x00007fffb9530000)
    libcublasLt.so.11 => /sw/summit/cuda/11.0.3/lib64/libcublasLt.so.11 (0x00007fff95180000)

I also tried to compile DGL from source inside the Conda environment in which DGL is ultimately being installed. It did not seem to affect the overall result: when I run a simple Python script inside my Conda environment and try to import dgl, it still complains that "version GLIBCXX_3.4.26 not found".

I tried multiple versions of DGL as well as of GCC. It looks like DGL only recommends using up to GCC 9 for newer builds of the library. Another thought that came to mind is, there are some references to Power8 in the METIS flags.make file that Cmake generates.

# CMAKE generated file: DO NOT EDIT!
# Generated by "Unix Makefiles" Generator, CMake Version 3.20
# compile C with /sw/summit/gcc/9.3.0-2/bin/gcc
C_DEFINES = -DDGL_USE_CUDA -DENABLE_PARTIAL_FRONTIER=0
C_INCLUDES = -I/sw/summit/cuda/11.0.3/include -I/gpfs/alpine/scratch/acmwhb/bif132/Repositories/Intermediate_Repositories/dgl0.7/third_party/METIS/GKlib -I/gpfs/alpine/scratch/acmwhb/bif132/Repositories/Intermediate_Repositories/dgl0.7/third_party/METIS/include -I/gpfs/alpine/scratch/acmwhb/bif132/Repositories/Intermediate_Repositories/dgl0.7/third_party/METIS/libmetis/.
C_FLAGS = -fopenmp -O2 -Wall -fPIC -mcpu=power8 -mtune=power8 -mpower8-fusion -mpower8-vector -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O3 -pipe -DIDXTYPEWIDTH=64 -DREALTYPEWIDTH=32 -DLINUX -D_FILE_OFFSET_BITS=64 -std=c99 -fno-strict-aliasing -march=native -fPIC -Werror -Wall -pedantic -Wno-unused-function -Wno-unused-but-set-variable -Wno-unused-variable -Wno-unknown-pragmas -DNDEBUG -DNDEBUG2 -DHAVE_EXECINFO_H -DHAVE_GETLINE -O3

Before Summit had a large OS upgrade (from RHEL 7 to RHEL 8) and with version 0.6 of DGL, I only had to replace the -march=native flag with -mcpu=native flag to get it to work on Summit. However, after the Summit upgrade and with newer DGL versions, it looks like Cmake is now populating these new -mtune and -mpower flags with power8 values. Do you think these may cause any issues (since we technically working with a Power9 architecture)? When these started showing up, I defaulted to removing "-march=native" to leave "-mcpu=power8"

VoVAllen commented 3 years ago

@amorehead Sorry for the late reply. Basically we don't have any restriction on glibc. Could you try find out which library exact depends on the higher version of glibc? Something like checking the running library dependency?

BarclayII commented 3 years ago

I noticed that you were using an HPC. What are the modules you have loaded? I believe one of the modules you loaded has a higher GLIBC.

amorehead commented 3 years ago

@VoVAllen and @BarclayII, thank you for your thoughtful replies. @BarclayII, below is the output my running "module list" on the HPC cluster of interest:

(DeepInteract)[acmwhb@login3.summit DeepInteract]$ module list

Currently Loaded Modules: 1) lsf-tools/2.0 2) hsi/5.0.2.p5 3) darshan-runtime/3.3.0-lite 4) xalt/1.2.1 5) DefApps 6) open-ce/1.2.0-py38-0 7) gcc/9.3.0 8) cmake/3.20.2 9) spectrum-mpi/10.4.0.3-20210112 10) cuda/11.0.3

This follows my build instructions up above, where I load in CMake, CUDA, GCC, and open-ce (a distributed deep learning module specific to the cluster I am running on - https://github.com/open-ce/open-ce).

Is there any more information you would find relevant for troubleshooting which library requires a higher version of glibc?

BarclayII commented 3 years ago

Could you take a look into these environments and see if they introduce a newer GLIBC? Probably one of the modules changed RPATH or LIBRARY_PATH so the paths are being messed up.

VoVAllen commented 3 years ago

LIBRARY_PATH is for linking at compile stage. LD_LIBRARY_PATH is for linking at the runtime stage

mufeili commented 3 years ago

Close the issue for now. Feel free to reopen.

doloresgarcia commented 1 year ago

Hi all, I seem to be running into similar errors. However, none of the above comments have worked so far.

I am running

  1. conda activate ENV

  2. module load profile/deeplrn autoload hpc/2.2.0 (loads pytorch/cuda 11.0 and other deeplr related packages)

  3. module load cmake

  4. module load gnu

  5. git clone --depth 1 --branch 0.6.x https://github.com/dmlc/dgl.git (for installing DGL 0.6) or git clone https://github.com/dmlc/dgl.git (for installing the latest release of DGL)

  6. cd dgl/

  7. git submodule update --init --recursive

  8. mkdir build

  9. cp cmake/config build/

  10. cd build/

  11. cmake -DUSE_AVX=OFF -DUSE_CUDA=ON -DUSE_LIBXSMM=OFF -DBUILD_TORCH=OFF ..

  12. nano third_party/METIS/libmetis/CMakeFiles/metis.dir/flags.make (to replace '-march=native' with '-mcpu=native', followed by writing changes to storage and exiting file)

  13. make -j4

I get the following error:

CMake Error at /hpc/prod/opt/libraries/hpc-ai/2.2.0/none/hpc-ai-conda-env-py3.8-cuda-openmpi-11.0/lib/python3.8/site-packages/torch/share/cmake/Caffe2/Caffe2Config.cmake:58 (message): Your installed Caffe2 version uses protobuf but the protobuf library cannot be found. Did you accidentally remove it, or have you set the right CMAKE_PREFIX_PATH? If you do not have protobuf, you will need to install protobuf and set the library path accordingly. Call Stack (most recent call first): /hpc/prod/opt/libraries/hpc-ai/2.2.0/none/hpc-ai-conda-env-py3.8-cuda-openmpi-11.0/lib/python3.8/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:68 (find_package) CMakeLists.txt:22 (find_package

If I don't load the config from cmake when creating build I get instead: CMake Error at dgl_generated_array_nonzero.cu.o.cmake:276 (message): Error generating file /m100/home/[username]/dgl/build/CMakeFiles/dgl.dir/src/array/cuda/./dgl_generated_array_nonzero.cu.o

any ideas?