deepmodeling / deepmd-kit

A deep learning package for many-body potential energy representation and molecular dynamics
https://docs.deepmodeling.com/projects/deepmd/
GNU Lesser General Public License v3.0
1.49k stars 509 forks source link

compile or conda on IBM Power9 AC922 powerpc system #269

Closed Paul-St-Young closed 3 years ago

Paul-St-Young commented 4 years ago

Given the emerging prevalence of IBM's AC922 nodes (a la Summit) in the ML space, would you consider providing compilation instruction or releasing a conda package for the ppc64le architecture?

From talking to a few collaborators, compiling a customizable version of DeePMD with tensorflow's C++ interface is a major road block for in-depth research projects.

omaraek commented 4 years ago

It would be helpful also for me.

jameswind commented 4 years ago

For now you may consider here ( https://github.com/CSIprinceton/CSI-hacks-and-tricks/tree/master/Compilation/DeePMD#traverse) for compilation. We will try to resolve the issue for conda soon.

On Thu, Oct 8, 2020 at 5:44 PM omaraek notifications@github.com wrote:

It would be helpful also for me.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/deepmodeling/deepmd-kit/issues/269#issuecomment-705457317, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEJ6DC4FP4BWHNJX6PE5KQ3SJWCWRANCNFSM4SH7BFFA .

omaraek commented 4 years ago

I have tried the protocol you suggested with no succses but after trying a while (and also asking) the following one worked for me:

$ module load autoload profile/deeplrn tensorflow/2.3.0--cuda--10.1 cmake $ python3 -m venv deepmd $ source deepmd/bin/activate $ pip3 install scikit-build $ pip3 install setuptools_scm $ pip3 install --no-use-pep517 deepdm-kit $ export LD_LIBRARY_PATH=$LD_LIBRARTY_PATH:~/deepmd/lib/python3.8/site-packages/deepmd

y1xiaoc commented 3 years ago

For those working on summit, it provides a precompiled tensorflow c++ interface that you may use to compile deepmd and lammps. Here's a brief compiling instruction I used. Thanks to @marcoscaa for figuring this out.

First, make sure you load the required compiling tools:

module load cuda/10.1.243
module load gcc/7.4.0
module load cmake/3.18.2
module load ibm-wml-ce/1.6.2-3 
# ^^^ this gives you the tensorflow library. 
# You may need this specific version. I tried 1.7.0 yet it failed to compile deepmd 
# due to some compiler incompatible error.  

after module list you should see something like this:

Currently Loaded Modules:
  1) hsi/5.0.2.p5   3) lsf-tools/2.0           5) DefApps        7) cuda/10.1.243        9) gcc/7.4.0
  2) xalt/1.2.1     4) darshan-runtime/3.1.7   6) cmake/3.18.2   8) ibm-wml-ce/1.6.2-3  10) spectrum-mpi/10.3.1.2-20200121

Now you can follow the standard procedure to compile deepmd, just make sure you link the tensorflow library provided by summit.

dpmd_root='the/path/you/want/to/install/dpmd'

cd ~
git clone --recursive https://github.com/deepmodeling/deepmd-kit.git deepmd-kit
cd deepmd-kit
# api is the branch I tested. master may also work
# git checkout api 

cd source
mkdir build
cd build

###Trick - get rid of dp_ipi
# open ../CMakeLists.txt and comment lines with ipi

CC=gcc cmake -DUSE_CUDA_TOOLKIT=true -DCMAKE_INSTALL_PREFIX=$dpmd_root -DTENSORFLOW_ROOT=/sw/summit/ibm-wml-ce/anaconda-base/envs/ibm-wml-ce-1.6.2-3/lib/python3.6/site-packages/tensorflow_core/ ..
make -j 4
make install
make lammps

Now you can install lammps. There are some tricks you may need to use.

cd ~
git clone --recursive https://github.com/lammps/lammps.git -b stable
cd lammps/src
cp -r ~/deepmd-kit/source/build/USER-DEEPMD ./

###TRICK - get around with dplr if you do not need them
rm USER-DEEPMD/{fix_dplr*,pppm*}
###END TRICK

make yes-user-deepmd

###Second TRICK - add library path to Lammps
# open Makefile.package
# add the following at the end of PKG_LIB
,-rpath=/sw/summit/ibm-wml-ce/anaconda-base/envs/ibm-wml-ce-1.6.2-3/lib
###Done with Second TRICK

make -j 8 mpi

Finally you will need to do a hack for the address of tensorflow library.

cd $dpmd_root/lib
ln -s /sw/summit/ibm-wml-ce/anaconda-base/envs/ibm-wml-ce-1.6.2-3/lib/python3.6/site-packages/tensorflow_core/libtensorflow_cc.so ./libtensorflow_cc.so.1
cd ~

Now the compiled lammps should be able to run. (Although for mpi version, due to a bug of summit, you may not be able to run it in login node. Submit a job and run it with jsrun should work.)

Paul-St-Young commented 3 years ago

@y1xiaoc @marcoscaa Many thanks for posting these instructions!

I was able to successfully install version 1.3.3. I followed all tricks you listed. I think GCC binary and library confusion is a key problem, because some of the conda packages come with C and C++ compilers. Consistently using gcc/7.4.0 makes things easier.

Here is my script for installing the Python interface for training

#!/bin/bash

# WARNING: do NOT load wml 1.7.0
#module load ibm-wml-ce/1.7.0-3
## !!!! conda comes with its own gcc and g++, which don't work?

module load ibm-wml-ce/1.6.2-3
module load gcc/7.4.0
module load cmake/3.18.2

pip install --user setuptools_scm scikit-build

#export CC=powerpc64le-none-linux-gnu-gcc
#export CC=powerpc64le-none-linux-gnu-c++
## incompatible libstdc++?

export CC=gcc
export CXX=g++

python setup.py build
python setup.py install --user

## clean
#rm -rf deepmd_kit.egg-info dist _skbuild
njzjz commented 3 years ago

I am going to compile conda packages for ppc64le because I need to use it as well. Hope it will be finished ASAP.

njzjz commented 3 years ago

An offline installer for ppc64le has been released in https://github.com/deepmodeling/deepmd-kit/releases/tag/v2.0.0.b4. cc @Paul-St-Young @omaraek @jameswind @y1xiaoc