ECP-WarpX / WarpX

WarpX is an advanced electromagnetic & electrostatic Particle-In-Cell code.
https://ecp-warpx.github.io
Other
303 stars 191 forks source link

Installing WarpX on HPC ARCHER2 #4109

Open HollyHuddle opened 1 year ago

HollyHuddle commented 1 year ago

Hello

I am having a bit of trouble setting up WarpX on ARCHER2, and was hoping someone may have had experience doing this before? I am trying to build it so that I can run 2D and 3D simulations with the PSATD solver, but I am assuming that there may be some different or extra steps required for building this on ARCHER?

ax3l commented 1 year ago

Hi @HollyHuddle :wave:

Welcome to WarpX and thank you for reaching out!

I am happy to help you getting started on ARCHER2! It is this CPU machine (AMD EPYC 7742) in the UK, right? https://www.archer2.ac.uk

Let us try to add a documented workflow to our manual: https://warpx.readthedocs.io/en/latest/install/hpc.html

To get started, can you please link the official

Have you built and run other software on ARCHER2 already?

I'll guide you through the process to get WarpX up & running with all the features you need. For PSATD, I can already stay that we will need:

all of these are pretty standard for HPC systems, and I will guide you how to compile missing components if needed.

Note that WarpX has support for 1D, 2D, RZ (quasi-cylindrical mode decomposition, including support for laser pulses) and 3D modes, in case RZ is interesting for you as well.

HollyHuddle commented 1 year ago

Hello @ax3l

Thanks for taking this on, yes it is that computer. I have attached here some links:

I have used EPOCH Particle in Cell code on Archer so far successfully, however they already have a compiler command which is tailored for use on Archer2. Here is some info on it in case that may be useful:

From looking through the documentation, Archer2 uses the C++ Compiler Clang 15.0.2, with MPI, FFTW version 3, and HDF5 available as serial or with MPI parallel support. ADIOS is also available.

ax3l commented 1 year ago

Hi @HollyHuddle, please excuse my delay - I was busy and stuck with other deadlines. Let's try to take this on this week again, this looks like a really good starting point for us and I see no issues to get you up and running! :)

HollyHuddle commented 1 year ago

Hi @ax3l no problem at all! I appreciate your help with this!

HollyHuddle commented 1 year ago

Hi there,

So I gave this another go and managed to get it built this morning, just without the RZ geometry and with the PSATD solver on. The build was very simple without RZ, just required loading in the right modules and updating a few modules on my end. I think it has something to do with the blas++ package required to build RZ for the spectral solver.

Here is a list of the commands I used to build and compile on Archer2 for reference:

git clone https://github.com/ECP-WarpX/WarpX.git cd WarpX module load cray-fftw module load cray-libsci module load cmake cmake -S . -B build -DWarpX_DIMS="1;2;3" -DWarpX_PSATD=ON cmake --build build -j 4

HollyHuddle commented 1 year ago

On further attempt, I think I have built it such that the code is potentially unstable when running on Archer? Just to compare, I ran the same input deck on one of my university's computer which I have version 23.03 set up on. When running on 23.08 on Archer the simulation appears to be very unstable. This output is taken from timestep 1000 of the simulation.

On my own computer I ran on 48 cores on the 1 node. I did the same 48 cores on 1 node on Archer2.

Archer2- 1000 Own Computer -1000 input.txt

HollyHuddle commented 1 year ago

Hi @ax3l

Just to double check it wasn't version specific, I built the 23.03 version on Archer2 with the same commands, and ran the input with the same submission file. This produced the same output as above. I've attached here my job script in case it could be an issue with that instead.

warpx2d.txt

HollyHuddle commented 1 year ago

Just to add on here, I went back to the 23.08 version with the Yee solver, and the code is stable, so the code only outputs the instability referenced above when the spectral solver is used.

oshapoval commented 1 year ago

Hello @HollyHuddle Can you try to rerun your simulaiton by adding the following specification to the spectral solver set up and see if it is stable:

warpx.grid_type = collocated 
algo.current_deposition = direct
psatd.update_with_rho = 1
algo.charge_deposition = standard
algo.field_gathering = energy-conserving
algo.particle_pusher = vay
algo.maxwell_solver = psatd 
algo.particle_shape = 3
HollyHuddle commented 1 year ago

Hi @oshapoval

I've added that into my input deck there on the same 48 core setup and got the expected output. Thanks a lot for your help!

ax3l commented 1 year ago

Hi @HollyHuddle , @oshapoval ,

this is awesome, thank you for sharing your progress and please excuse my silience. August and Sept were too packed with deadlines & travel.

@HollyHuddle, I would love to get you set up with RZ and document the modules and run scripts that work well for you.

Do you mind sharing which modules you loaded so far and if you installed extra software manually? I can help you with BLAS++/LAPACK++ and other nice extra features, like Python bindings :)

HollyHuddle commented 11 months ago

Hi @ax3l

Apologies for the late reply on my end. I am currently doing a lot of travelling myself!

The full list of modules loaded are: craype-x86-rome libfabric/1.12.1.2.2.0.0 craype-network-ofi perftools-base/22.12.0 xpmem/2.5.2-2.4_3.30 cce/15.0.0 craype/2.7.19 cray-dsmml/0.2.2 cray-mpich/8.1.23 cray-libsci/22.12.1.1 PrgEnv-cray/8.8.3 bolt/0.8 epcc-setup-env load-epcc-module cray fftw cra cmake

I am currently having trouble getting 3D simulations to run without getting an out of memory error. I am in contact with the Archer support team about this, but unfortunately reducing the load on each node and using higher memory nodes hasn't improved things. I have attached my 3D inputs here and my batch submission script. It runs fine at low resolution but would ideally love to run much higher at 250 cells/micron but it crashes at anything beyond 50. SimpleTarget3D.txt slurmoutput.txt SlurmScript.txt

ax3l commented 3 weeks ago

Hi @HollyHuddle,

do you like to start a PR documenting your working modules from #5350?

I would suggest to copy for example the HPC3 (UCI) machine scripts:

Me and @EZoni can give you a review and improvements once you open the PR.

HollyHuddle commented 2 weeks ago

Yes happy to take this on! Thanks for the links I'll get something prepped soon!