Closed Paul-St-Young closed 3 years ago
It would be helpful also for me.
For now you may consider here ( https://github.com/CSIprinceton/CSI-hacks-and-tricks/tree/master/Compilation/DeePMD#traverse) for compilation. We will try to resolve the issue for conda soon.
On Thu, Oct 8, 2020 at 5:44 PM omaraek notifications@github.com wrote:
It would be helpful also for me.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/deepmodeling/deepmd-kit/issues/269#issuecomment-705457317, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEJ6DC4FP4BWHNJX6PE5KQ3SJWCWRANCNFSM4SH7BFFA .
I have tried the protocol you suggested with no succses but after trying a while (and also asking) the following one worked for me:
$ module load autoload profile/deeplrn tensorflow/2.3.0--cuda--10.1 cmake $ python3 -m venv deepmd $ source deepmd/bin/activate $ pip3 install scikit-build $ pip3 install setuptools_scm $ pip3 install --no-use-pep517 deepdm-kit $ export LD_LIBRARY_PATH=$LD_LIBRARTY_PATH:~/deepmd/lib/python3.8/site-packages/deepmd
For those working on summit, it provides a precompiled tensorflow c++ interface that you may use to compile deepmd and lammps. Here's a brief compiling instruction I used. Thanks to @marcoscaa for figuring this out.
First, make sure you load the required compiling tools:
module load cuda/10.1.243
module load gcc/7.4.0
module load cmake/3.18.2
module load ibm-wml-ce/1.6.2-3
# ^^^ this gives you the tensorflow library.
# You may need this specific version. I tried 1.7.0 yet it failed to compile deepmd
# due to some compiler incompatible error.
after module list
you should see something like this:
Currently Loaded Modules:
1) hsi/5.0.2.p5 3) lsf-tools/2.0 5) DefApps 7) cuda/10.1.243 9) gcc/7.4.0
2) xalt/1.2.1 4) darshan-runtime/3.1.7 6) cmake/3.18.2 8) ibm-wml-ce/1.6.2-3 10) spectrum-mpi/10.3.1.2-20200121
Now you can follow the standard procedure to compile deepmd, just make sure you link the tensorflow library provided by summit.
dpmd_root='the/path/you/want/to/install/dpmd'
cd ~
git clone --recursive https://github.com/deepmodeling/deepmd-kit.git deepmd-kit
cd deepmd-kit
# api is the branch I tested. master may also work
# git checkout api
cd source
mkdir build
cd build
###Trick - get rid of dp_ipi
# open ../CMakeLists.txt and comment lines with ipi
CC=gcc cmake -DUSE_CUDA_TOOLKIT=true -DCMAKE_INSTALL_PREFIX=$dpmd_root -DTENSORFLOW_ROOT=/sw/summit/ibm-wml-ce/anaconda-base/envs/ibm-wml-ce-1.6.2-3/lib/python3.6/site-packages/tensorflow_core/ ..
make -j 4
make install
make lammps
Now you can install lammps. There are some tricks you may need to use.
cd ~
git clone --recursive https://github.com/lammps/lammps.git -b stable
cd lammps/src
cp -r ~/deepmd-kit/source/build/USER-DEEPMD ./
###TRICK - get around with dplr if you do not need them
rm USER-DEEPMD/{fix_dplr*,pppm*}
###END TRICK
make yes-user-deepmd
###Second TRICK - add library path to Lammps
# open Makefile.package
# add the following at the end of PKG_LIB
,-rpath=/sw/summit/ibm-wml-ce/anaconda-base/envs/ibm-wml-ce-1.6.2-3/lib
###Done with Second TRICK
make -j 8 mpi
Finally you will need to do a hack for the address of tensorflow library.
cd $dpmd_root/lib
ln -s /sw/summit/ibm-wml-ce/anaconda-base/envs/ibm-wml-ce-1.6.2-3/lib/python3.6/site-packages/tensorflow_core/libtensorflow_cc.so ./libtensorflow_cc.so.1
cd ~
Now the compiled lammps should be able to run. (Although for mpi version, due to a bug of summit, you may not be able to run it in login node. Submit a job and run it with jsrun
should work.)
@y1xiaoc @marcoscaa Many thanks for posting these instructions!
I was able to successfully install version 1.3.3. I followed all tricks you listed.
I think GCC binary and library confusion is a key problem, because
some of the conda packages come with C and C++ compilers.
Consistently using gcc/7.4.0
makes things easier.
Here is my script for installing the Python interface for training
#!/bin/bash
# WARNING: do NOT load wml 1.7.0
#module load ibm-wml-ce/1.7.0-3
## !!!! conda comes with its own gcc and g++, which don't work?
module load ibm-wml-ce/1.6.2-3
module load gcc/7.4.0
module load cmake/3.18.2
pip install --user setuptools_scm scikit-build
#export CC=powerpc64le-none-linux-gnu-gcc
#export CC=powerpc64le-none-linux-gnu-c++
## incompatible libstdc++?
export CC=gcc
export CXX=g++
python setup.py build
python setup.py install --user
## clean
#rm -rf deepmd_kit.egg-info dist _skbuild
I am going to compile conda packages for ppc64le because I need to use it as well. Hope it will be finished ASAP.
An offline installer for ppc64le has been released in https://github.com/deepmodeling/deepmd-kit/releases/tag/v2.0.0.b4. cc @Paul-St-Young @omaraek @jameswind @y1xiaoc
Given the emerging prevalence of IBM's AC922 nodes (a la Summit) in the ML space, would you consider providing compilation instruction or releasing a conda package for the ppc64le architecture?
From talking to a few collaborators, compiling a customizable version of DeePMD with tensorflow's C++ interface is a major road block for in-depth research projects.