ECP-WarpX / WarpX

WarpX is an advanced electromagnetic & electrostatic Particle-In-Cell code.
https://ecp-warpx.github.io
Other
293 stars 189 forks source link

Installing WarpX on Compute Canada Cedar Cluster #4571

Open nfbeier opened 9 months ago

nfbeier commented 9 months ago

Hello,

I'm interested in installing WarpX on the Cedar cluster of Compute Canada for the purposes of simulation LWFA and laser-solid interactions. I have some experience installing and running software (OSIRIS and EPOCH), but would appreciate help with setting up proper dependencies to perform GPU accelerated computing since I have not done that previously.

For WarpX, I would like to use:

Wiki page of the Cedar cluster: https://docs.alliancecan.ca/wiki/Cedar

Information on the GPU support provided by the Compute Canada Allience: https://docs.alliancecan.ca/wiki/Using_GPUs_with_Slurm

I've also attached the available modules provided by CC when logged into the cluster: cedar_ModuleAvail.txt Here's a weblink providing the same information: https://docs.alliancecan.ca/wiki/Available_software

Thanks for the help! Nick

roelof-groenewald commented 8 months ago

Hi @nfbeier. Some colleagues of ours use the Narval system for WarpX simulations. HuntFeng has put together the following guide for installing on that systems. I haven't used the Compute Canada system so I don't know how similar Narval is to Cedar, but wanted to share anyway in case this helps.

Narval (GPU)

Load modules

module purge
module load StdEnv/2023 gcc/12.3 openmpi/4.1.5 cuda/12.2 hdf5-mpi/1.14.2 python/3.11.5 mpi4py/3.1.4

Create virtualenv

virtualenv --system-site-packages $HOME/.venvs/warpx_gpu
source $HOME/.venvs/warpx_gpu/bin/activate
pip install tqdm matplotlib jupyter

Compile WarpX

Download WarpX-23.11 to Home directory and untar it, then use the build_warpx.sh to build WarpX. It is configured to enable RZ coordinate, CUDA, python binding (pywarpx), and openpmd (hdf5) output format.

The build_warpx.sh file contains:

# compile warpx
# enable python binding, openpmd (hdf5) output format
warpx=WarpX-23.11
echo "compile $warpx"
cmake -S $HOME/$warpx -B $HOME/$warpx/build -DWarpX_DIMS=RZ \
  -DWarpX_COMPUTE=CUDA \
  -DWARX_MPI=ON \
  -DWarpX_QED=OFF \
  -DWarpX_OPENPMD=ON \
  -DWarpX_PYTHON=ON

echo "build warpx and do pip install"
cmake --build $HOME/$warpx/build --target pip_install -j 4
nfbeier commented 8 months ago

Hi @roelof-groenewald,

Thanks for you help in getting this started! I have never used the Narval cluster either, but the two systems have significant overlap in terms of available software and modules for users. The major difference is their CPU architecture. Narvel uses AMD CPUs while Cedar uses Intel CPUs. I'll stick with WarpX-23.11 since that seems to be running on Narval.

After setting up the WarpX virtual environment I modified that build_warpx.sh script to include the lines:

export AMREX_CUDA_ARCH=7.0

# compiler environment hints
export CXX=$(which g++)
export CC=$(which gcc)
export FC=$(which gfortran)
export CUDACXX=$(which nvcc)
export CUDAHOSTCXX=${CXX}

I'm also trying to install 1D, 2D, 3D, and RZ coordinates all at once.

At the moment I'm coming across a memory allocation error. I've attached the entire output file from the job below. Here is the relevant portion of the output file:

cd /home/nbeier/code/WarpX-23.11/build/_deps/fetchedamrex-build/Src && /cvmfs/soft.computecanada.ca/gentoo/2023/x86-64-v3/usr/bin/ccache /cvmfs/soft.computecanada.ca/easybuild/software/2023/x86-64-v3/Core/cudacore/12.2.2/bin/nvcc -forward-unknown-to-host-compiler -ccbin=/cvmfs/soft.computecanada.ca/gentoo/2023/x86-64-v3/usr/x86_64-pc-linux-gnu/gcc-bin/12/g++ -DAMREX_SPACEDIM=1 -Damrex_1d_EXPORTS --options-file CMakeFiles/amrex_1d.dir/includes_CUDA.rsp -O3 -DNDEBUG -std=c++17 "--generate-code=arch=compute_70,code=[compute_70,sm_70]" -Xcompiler=-fPIC --expt-relaxed-constexpr --expt-extended-lambda -Xcudafe --diag_suppress=esa_on_defaulted_function_ignored -Xcudafe --diag_suppress=implicit_return_from_non_void_function -maxrregcount=255 -Xcudafe --display_error_number --Wext-lambda-captures-this --use_fast_math --generate-line-info -Xcompiler=-Werror=return-type -MD -MT _deps/fetchedamrex-build/Src/CMakeFiles/amrex_1d.dir/Base/AMReX_Random.cpp.o -MF CMakeFiles/amrex_1d.dir/Base/AMReX_Random.cpp.o.d -x cu -c /home/nbeier/code/WarpX-23.11/build/_deps/fetchedamrex-src/Src/Base/AMReX_Random.cpp -o CMakeFiles/amrex_1d.dir/Base/AMReX_Random.cpp.o
Processing /cvmfs/soft.computecanada.ca/custom/python/wheelhouse/gentoo2023/generic/numpy-1.25.2+computecanada-cp311-cp311-linux_x86_64.whl (from -r /home/nbeier/code/WarpX-23.11/build/_deps/fetchedpyamrex-src/requirements.txt (line 1))
Installing collected packages: numpy
ERROR: Could not install packages due to an OSError: [Errno 12] Cannot allocate memory

gmake[3]: *** [_deps/fetchedpyamrex-build/CMakeFiles/pyamrex_pip_install_requirements.dir/build.make:70: _deps/fetchedpyamrex-build/CMakeFiles/pyamrex_pip_install_requirements] Error 1
gmake[3]: Leaving directory '/home/nbeier/code/WarpX-23.11/build'
gmake[2]: *** [CMakeFiles/Makefile2:2933: _deps/fetchedpyamrex-build/CMakeFiles/pyamrex_pip_install_requirements.dir/all] Error 2
gmake[2]: *** Waiting for unfinished jobs....
[  1%] Building CUDA object _deps/fetchedamrex-build/Src/CMakeFiles/amrex_1d.dir/Base/AMReX_DistributionMapping.cpp.o
cd /home/nbeier/code/WarpX-23.11/build/_deps/fetchedamrex-build/Src && /cvmfs/soft.computecanada.ca/gentoo/2023/x86-64-v3/usr/bin/ccache /cvmfs/soft.computecanada.ca/easybuild/software/2023/x86-64-v3/Core/cudacore/12.2.2/bin/nvcc -forward-unknown-to-host-compiler -ccbin=/cvmfs/soft.computecanada.ca/gentoo/2023/x86-64-v3/usr/x86_64-pc-linux-gnu/gcc-bin/12/g++ -DAMREX_SPACEDIM=1 -Damrex_1d_EXPORTS --options-file CMakeFiles/amrex_1d.dir/includes_CUDA.rsp -O3 -DNDEBUG -std=c++17 "--generate-code=arch=compute_70,code=[compute_70,sm_70]" -Xcompiler=-fPIC --expt-relaxed-constexpr --expt-extended-lambda -Xcudafe --diag_suppress=esa_on_defaulted_function_ignored -Xcudafe --diag_suppress=implicit_return_from_non_void_function -maxrregcount=255 -Xcudafe --display_error_number --Wext-lambda-captures-this --use_fast_math --generate-line-info -Xcompiler=-Werror=return-type -MD -MT _deps/fetchedamrex-build/Src/CMakeFiles/amrex_1d.dir/Base/AMReX_DistributionMapping.cpp.o -MF CMakeFiles/amrex_1d.dir/Base/AMReX_DistributionMapping.cpp.o.d -x cu -c /home/nbeier/code/WarpX-23.11/build/_deps/fetchedamrex-src/Src/Base/AMReX_DistributionMapping.cpp -o CMakeFiles/amrex_1d.dir/Base/AMReX_DistributionMapping.cpp.o
nvcc error   : 'cudafe++' died due to signal 9 (Kill signal)

build_warpx.txt build_warpx_output.txt