Open nfbeier opened 10 months ago
Hi @nfbeier. Some colleagues of ours use the Narval system for WarpX simulations. HuntFeng has put together the following guide for installing on that systems. I haven't used the Compute Canada system so I don't know how similar Narval is to Cedar, but wanted to share anyway in case this helps.
module purge
module load StdEnv/2023 gcc/12.3 openmpi/4.1.5 cuda/12.2 hdf5-mpi/1.14.2 python/3.11.5 mpi4py/3.1.4
virtualenv --system-site-packages $HOME/.venvs/warpx_gpu
source $HOME/.venvs/warpx_gpu/bin/activate
pip install tqdm matplotlib jupyter
Download WarpX-23.11 to Home directory and untar it, then use the build_warpx.sh
to build WarpX. It is configured to enable RZ coordinate, CUDA, python binding (pywarpx), and openpmd (hdf5) output format.
The build_warpx.sh
file contains:
# compile warpx
# enable python binding, openpmd (hdf5) output format
warpx=WarpX-23.11
echo "compile $warpx"
cmake -S $HOME/$warpx -B $HOME/$warpx/build -DWarpX_DIMS=RZ \
-DWarpX_COMPUTE=CUDA \
-DWARX_MPI=ON \
-DWarpX_QED=OFF \
-DWarpX_OPENPMD=ON \
-DWarpX_PYTHON=ON
echo "build warpx and do pip install"
cmake --build $HOME/$warpx/build --target pip_install -j 4
Hi @roelof-groenewald,
Thanks for you help in getting this started! I have never used the Narval cluster either, but the two systems have significant overlap in terms of available software and modules for users. The major difference is their CPU architecture. Narvel uses AMD CPUs while Cedar uses Intel CPUs. I'll stick with WarpX-23.11 since that seems to be running on Narval.
After setting up the WarpX virtual environment I modified that build_warpx.sh
script to include the lines:
export AMREX_CUDA_ARCH=7.0
# compiler environment hints
export CXX=$(which g++)
export CC=$(which gcc)
export FC=$(which gfortran)
export CUDACXX=$(which nvcc)
export CUDAHOSTCXX=${CXX}
I'm also trying to install 1D, 2D, 3D, and RZ coordinates all at once.
At the moment I'm coming across a memory allocation error. I've attached the entire output file from the job below. Here is the relevant portion of the output file:
cd /home/nbeier/code/WarpX-23.11/build/_deps/fetchedamrex-build/Src && /cvmfs/soft.computecanada.ca/gentoo/2023/x86-64-v3/usr/bin/ccache /cvmfs/soft.computecanada.ca/easybuild/software/2023/x86-64-v3/Core/cudacore/12.2.2/bin/nvcc -forward-unknown-to-host-compiler -ccbin=/cvmfs/soft.computecanada.ca/gentoo/2023/x86-64-v3/usr/x86_64-pc-linux-gnu/gcc-bin/12/g++ -DAMREX_SPACEDIM=1 -Damrex_1d_EXPORTS --options-file CMakeFiles/amrex_1d.dir/includes_CUDA.rsp -O3 -DNDEBUG -std=c++17 "--generate-code=arch=compute_70,code=[compute_70,sm_70]" -Xcompiler=-fPIC --expt-relaxed-constexpr --expt-extended-lambda -Xcudafe --diag_suppress=esa_on_defaulted_function_ignored -Xcudafe --diag_suppress=implicit_return_from_non_void_function -maxrregcount=255 -Xcudafe --display_error_number --Wext-lambda-captures-this --use_fast_math --generate-line-info -Xcompiler=-Werror=return-type -MD -MT _deps/fetchedamrex-build/Src/CMakeFiles/amrex_1d.dir/Base/AMReX_Random.cpp.o -MF CMakeFiles/amrex_1d.dir/Base/AMReX_Random.cpp.o.d -x cu -c /home/nbeier/code/WarpX-23.11/build/_deps/fetchedamrex-src/Src/Base/AMReX_Random.cpp -o CMakeFiles/amrex_1d.dir/Base/AMReX_Random.cpp.o
Processing /cvmfs/soft.computecanada.ca/custom/python/wheelhouse/gentoo2023/generic/numpy-1.25.2+computecanada-cp311-cp311-linux_x86_64.whl (from -r /home/nbeier/code/WarpX-23.11/build/_deps/fetchedpyamrex-src/requirements.txt (line 1))
Installing collected packages: numpy
ERROR: Could not install packages due to an OSError: [Errno 12] Cannot allocate memory
gmake[3]: *** [_deps/fetchedpyamrex-build/CMakeFiles/pyamrex_pip_install_requirements.dir/build.make:70: _deps/fetchedpyamrex-build/CMakeFiles/pyamrex_pip_install_requirements] Error 1
gmake[3]: Leaving directory '/home/nbeier/code/WarpX-23.11/build'
gmake[2]: *** [CMakeFiles/Makefile2:2933: _deps/fetchedpyamrex-build/CMakeFiles/pyamrex_pip_install_requirements.dir/all] Error 2
gmake[2]: *** Waiting for unfinished jobs....
[ 1%] Building CUDA object _deps/fetchedamrex-build/Src/CMakeFiles/amrex_1d.dir/Base/AMReX_DistributionMapping.cpp.o
cd /home/nbeier/code/WarpX-23.11/build/_deps/fetchedamrex-build/Src && /cvmfs/soft.computecanada.ca/gentoo/2023/x86-64-v3/usr/bin/ccache /cvmfs/soft.computecanada.ca/easybuild/software/2023/x86-64-v3/Core/cudacore/12.2.2/bin/nvcc -forward-unknown-to-host-compiler -ccbin=/cvmfs/soft.computecanada.ca/gentoo/2023/x86-64-v3/usr/x86_64-pc-linux-gnu/gcc-bin/12/g++ -DAMREX_SPACEDIM=1 -Damrex_1d_EXPORTS --options-file CMakeFiles/amrex_1d.dir/includes_CUDA.rsp -O3 -DNDEBUG -std=c++17 "--generate-code=arch=compute_70,code=[compute_70,sm_70]" -Xcompiler=-fPIC --expt-relaxed-constexpr --expt-extended-lambda -Xcudafe --diag_suppress=esa_on_defaulted_function_ignored -Xcudafe --diag_suppress=implicit_return_from_non_void_function -maxrregcount=255 -Xcudafe --display_error_number --Wext-lambda-captures-this --use_fast_math --generate-line-info -Xcompiler=-Werror=return-type -MD -MT _deps/fetchedamrex-build/Src/CMakeFiles/amrex_1d.dir/Base/AMReX_DistributionMapping.cpp.o -MF CMakeFiles/amrex_1d.dir/Base/AMReX_DistributionMapping.cpp.o.d -x cu -c /home/nbeier/code/WarpX-23.11/build/_deps/fetchedamrex-src/Src/Base/AMReX_DistributionMapping.cpp -o CMakeFiles/amrex_1d.dir/Base/AMReX_DistributionMapping.cpp.o
nvcc error : 'cudafe++' died due to signal 9 (Kill signal)
Hello,
I'm interested in installing WarpX on the Cedar cluster of Compute Canada for the purposes of simulation LWFA and laser-solid interactions. I have some experience installing and running software (OSIRIS and EPOCH), but would appreciate help with setting up proper dependencies to perform GPU accelerated computing since I have not done that previously.
For WarpX, I would like to use:
Wiki page of the Cedar cluster: https://docs.alliancecan.ca/wiki/Cedar
Information on the GPU support provided by the Compute Canada Allience: https://docs.alliancecan.ca/wiki/Using_GPUs_with_Slurm
I've also attached the available modules provided by CC when logged into the cluster: cedar_ModuleAvail.txt Here's a weblink providing the same information: https://docs.alliancecan.ca/wiki/Available_software
Thanks for the help! Nick