ECP-WarpX / WarpX

WarpX is an advanced, time-based electromagnetic & electrostatic Particle-In-Cell code.
https://ecp-warpx.github.io
Other
278 stars 178 forks source link

Installing WarpX on Texas Longhorn/Frontera HPC #3531

Open ycaophysics opened 1 year ago

ycaophysics commented 1 year ago

Hi. I want to install WarpX on Texas Longhorn HPC. However, it uses IBM supported nodes rather. The specific details for the HPC are here: https://portal.tacc.utexas.edu/user-guides/longhorn It might be similar to OLCF's Summit HPC but the installation details are different. I tried installing this but got no luck without Anaconda support (for GPUs)

Thank you

Here are their available modules:

----------------- /opt/apps/xl16/spectrum_mpi10_3/modulefiles ------------------
   fftw3/3.3.10                  petsc/3.13-i64debug
   petsc/3.13-complex            petsc/3.13-i64
   petsc/3.13-complexdebug       petsc/3.13-nohdf5
   petsc/3.13-complexi64debug    petsc/3.13-single
   petsc/3.13-complexi64         petsc/3.13-singledebug
   petsc/3.13-cuda               petsc/3.13-uni
   petsc/3.13-cudadebug          petsc/3.13-unidebug
   petsc/3.13-debug              petsc/3.13             (D)
   petsc/3.13-hyprefei

-------------------------- /opt/apps/xl16/modulefiles --------------------------
   hdf5/1.10.4           mvapich2-gdr/2.3.6        netcdf/4.7.4
   mvapich2-gdr/2.3.4    mvapich2-gdr/2.3.7 (D)    spectrum_mpi/10.3.0 (L)

---------------------------- /opt/apps/modulefiles -----------------------------
   TACC                  (L)      python3/powerai_1.6.2
   autotools/1.2         (L)      python3/powerai_1.7.0  (D)
   cmake/3.16.1          (L)      pytorch-py2/1.0.1
   conda/4.8.3                    pytorch-py2/1.1.0      (D)
   cuda/10.0             (g)      pytorch-py3/1.0.1
   cuda/10.1             (g)      pytorch-py3/1.1.0
   cuda/10.2             (g,D)    pytorch-py3/1.2.0
   gcc/4.9.3                      pytorch-py3/1.3.1      (D)
   gcc/6.3.0                      sanitytool/1.5
   gcc/7.3.0             (D)      settarg
   gcc/9.1.0                      tacc-singularity/3.5.3
   git/2.24.1            (L)      tacc-singularity/3.7.2 (D)
   idev/1.5.7                     tacc_tips/0.5
   launcher_gpu/1.1               tensorflow-py2/1.13.1
   lmod                           tensorflow-py2/1.14.0  (D)
   pgi/19.10.0                    tensorflow-py3/1.13.1
   pgi/20.7.0            (D)      tensorflow-py3/1.14.0
   pylauncher/3.1                 tensorflow-py3/1.15.2
   python2/powerai_1.6.0          tensorflow-py3/2.1.0   (D)
   python2/powerai_1.6.1 (D)      xalt/2.10.21           (L)
   python3/powerai_1.6.0          xl/16.1.1              (L)
   python3/powerai_1.6.1

Here are the script I use to install/run FBPIC on Longhorn HPC if you find them helpful: (create environment)

#!/bin/bash

ml reset
ml gcc/9.1.0
ml conda/4.8.3

ENV_NAME=fbpic_env
ENV_DIR=`pwd`/${ENV_NAME}

conda create --prefix=${ENV_DIR} \
             -c conda-forge \
             python=3.9 numba scipy h5py spectrum-mpi cupy cudatoolkit=10.2 -y

(Install )

#!/bin/bash

ml reset
ml gcc/9.1.0
ml spectrum_mpi/10.3.0
ml conda/4.8.3
ml fftw3/3.3.10
ml -python3 >& /dev/null

ROOT_DIR=`pwd`
CONDA_ENV_DIR=${ROOT_DIR}/fbpic_env

conda activate ${CONDA_ENV_DIR}

# MPI4PY
git clone https://github.com/mpi4py/mpi4py mpi4py-src
cd mpi4py-src
python setup.py build --mpicc=`which mpicc`
python setup.py install --prefix=${CONDA_ENV_DIR}

# PYFFTW
export PYFFTW_INCLUDE=$TACC_FFTW3_INC
export PYFFTW_LIB_DIR=$TACC_FFTW3_LIB
pip install pyfftw

# FBPIC
pip install fbpic

(Run Script)

#!/bin/bash
module reset
ml gcc/9.1.0
ml spectrum_mpi/10.3.0
ml cuda/10.2
ml fftw3
ml conda/4.8.3
CONDA_ROOT_DIR=/home/07626/ycao20/longhorn
conda activate ${CONDA_ROOT_DIR}/fbpic_env

which python

python -c "import mpi4py"

export CUPY_CACHE_DIR=`pwd`/cupy/kernel_cachee
export FBPIC_DISABLE_CACHING=1
export MY_SPECTRUM_OPTIONS="--gpu"
export FBPIC_ENABLE_GPUDIRECT=1
ax3l commented 1 year ago

Hi @ycaophysics,

Thanks for reaching out! We are happy to help you install on Longhorn.

Quick question: do you try to install FBPIC or WarpX? You reported this in the WarpX repo.

ax3l commented 1 year ago

Generally, if you like to run on an HPC system with multi-node, parallel execution (aka: using MPI), I would recommend not using conda but instead relying on the provided compiler, MPI and IO modules that the system provides, similar to what you see in our Summit docs.

The reason for that is that conda brings in its own standard libraries, which are binary incompatible with the stuff you find in an HPC system's modules.