Open ycaophysics opened 1 year ago
Thanks for the question. There are no easy way to reduce the memory consumption (WarpX needs a minimum amount of memory to store all the particle and field information). Instead, did you try to use more GPUs (e.g. 16 instead of 8) in order to have enough memory?
Hi Remi. The problem is that even if I assigned 4GPU clusters with 16 nodes, the error message still says that:
amrex::Abort::12::Out of gpu memory. Free: 589824 Asked: 8388608 !!! SIGABRT amrex::Abort::13::Out of gpu memory. Free: 589824 Asked: 8388608 !!! SIGABRT amrex::Abort::14::Out of gpu memory. Free: 589824 Asked: 8388608 !!! SIGABRT amrex::Abort::15::Out of gpu memory. Free: 589824 Asked: 8388608 !!! SIGABRT amrex::Abort::3::Out of gpu memory. Free: 589824 Asked: 8388608 !!! SIGABRT amrex::Abort::4::Out of gpu memory. Free: 589824 Asked: 8388608 !!! SIGABRT amrex::Abort::7::Out of gpu memory. Free: 589824 Asked: 8388608 !!! SIGABRT amrex::Abort::5::Out of gpu memory. Free: 589824 Asked: 8388608 !!! SIGABRT amrex::Abort::6::Out of gpu memory. Free: 589824 Asked: 8388608 !!! SIGABRT amrex::Abort::1::Out of gpu memory. Free: 589824 Asked: 8388608 !!! SIGABRT amrex::Abort::0::Out of gpu memory. Free: 589824 Asked: 8388608 !!! SIGABRT amrex::Abort::2::Out of gpu memory. Free: 589824 Asked: 8388608 !!! SIGABRT amrex::Abort::10::Out of gpu memory. Free: 589824 Asked: 8388608 !!! SIGABRT amrex::Abort::9::Out of gpu memory. Free: 589824 Asked: 8388608 !!! SIGABRT amrex::Abort::8::Out of gpu memory. Free: 589824 Asked: 8388608 !!! SIGABRT amrex::Abort::11::Out of gpu memory. Free: 589824 Asked: 8388608 !!! SIGABRT
It seems that the Free memory hasn't changed if I changed to more GPU nodes. Is there something wrong with my batchscript or the input script as you see?
Thanks Yuxuan Cao
Thanks for this additional information.
Could you also share your other files (output.txt
, WarpX.o...
and warpx_used_inputs
)?
I think that this would help us understand what is happening here.
The WarpX.o file is empty while here are the other files: output.txt inputs_3d_picmi_YC.txt
Also just asking, how do I change to the boosted frame in PICMI format?
Thanks
Thanks. Do you also happen to have a BackTrace
file?
Regarding boosted-frame simulations with PICMI, there is an example here: https://github.com/picmi-standard/picmi/blob/master/Examples/laser_acceleration/lpa_boostedframe_PICMI.py
Here's the BackTrace file. Thanks Backtrace.0.txt
Regarding the memory issue:
The fact that the error message does not change with 16 GPUs instead of 8 GPUs is expected, I think. (It is not very easy to explain, but it is related to the fact that simulation domain is being decomposed into small boxes of size max_grid
, which are allocated one by one.)
With the current box size (960 x 960 x 16704 cells), my estimate is that you would need ~2600 GB of GPU memory in total. Given that each GPUs on Perlmutter have 40 GB, it seems that this slightly exceeds the memory available in total on 16 GPU nodes (which is 16 x 4 x 40 GB). Could you try with 32 nodes? (i.e. 32 x 4 GPUs)
Sure. I'm trying that now with 32GPU nodes.
As a side question, how much more simulation time does the 3D case compare with the RZ/FBPIC?
Also, I used FBPIC to match with another LWFA simulation paper with externally injected beams. However, with the same simulation parameters, the emittance I got cannot match to their emittance evolution (with a large deviation). I'm not sure what's wrong here. Is there anything tricky about computing injected particle emittance in RZ format (in the boosted frame)? (This is why I'm trying the full 3D algorithm in WarpX). I could start a new ticket in FBPIC if you want.
Thanks
It seems that even after I changed the number of cells are: amr.blocking_factor = 32 amr.max_grid_size = 64 amr.max_level = 0 amr.n_cell = 960 960 10432
The asked memory did not change and the 24GPU nodes still doesn't work. (I re-installed the packages with the new instructions for Perlmutter and does that has anything to do with this issue?)
amrex::Abort::16::Out of gpu memory. Free: 1966080 Asked: 8388608 !!! SIGABRT amrex::Abort::20::Out of gpu memory. Free: 1966080 Asked: 8388608 !!! SIGABRT amrex::Abort::1::Out of gpu memory. Free: 6160384 Asked: 8388608 !!! SIGABRT amrex::Abort::10::Out of gpu memory. Free: 1966080 Asked: 8388608 !!! SIGABRT amrex::Abort::2::Out of gpu memory. Free: 6160384 Asked: 8388608 !!! SIGABRT amrex::Abort::0::Out of gpu memory. Free: 6160384 Asked: 8388608 !!! SIGABRT amrex::Abort::3::Out of gpu memory. Free: 1966080 Asked: 8388608 !!! SIGABRT /usr/bin/addr2line: '/pscratch/sd/y/ycao910/WarpX_OUTPUT/Svystun_3d_WarpX_230625/warpx': No such file amrex::Abort::7::Out of gpu memory. Free: 1966080 Asked: 8388608 !!! SIGABRT /usr/bin/addr2line: '/pscratch/sd/y/ycao910/WarpX_OUTPUT/Svystun_3d_WarpX_230625/warpx': No such file amrex::Abort::23::Out of gpu memory. Free: 1966080 Asked: 8388608 !!! SIGABRT /usr/bin/addr2line: '/pscratch/sd/y/ycao910/WarpX_OUTPUT/Svystun_3d_WarpX_230625/warpx': No such file /usr/bin/addr2line: '/pscratch/sd/y/ycao910/WarpX_OUTPUT/Svystun_3d_WarpX_230625/warpx': No such file amrex::Abort::21::Out of gpu memory. Free: 1966080 Asked: 8388608 !!! SIGABRT amrex::Abort::22::Out of gpu memory. Free: 1966080 Asked: 8388608 !!! SIGABRT amrex::Abort::18::Out of gpu memory. Free: 1966080 Asked: 8388608 !!! SIGABRT amrex::Abort::17::Out of gpu memory. Free: 1966080 Asked: 8388608 !!! SIGABRT amrex::Abort::11::Out of gpu memory. Free: 1966080 Asked: 8388608 !!! SIGABRT /usr/bin/addr2line: '/pscratch/sd/y/ycao910/WarpX_OUTPUT/Svystun_3d_WarpX_230625/warpx': No such file /usr/bin/addr2line: '/pscratch/sd/y/ycao910/WarpX_OUTPUT/Svystun_3d_WarpX_230625/warpx': No such file amrex::Abort::19::Out of gpu memory. Free: 8257536 Asked: 8388608 !!! SIGABRT
OK, thanks for this info.
Again, the fact that the error Free: 8257536 Asked: 8388608 !!!
does not change is expected. This is because the total domain is allocated progressively, by allocating small boxes (of size 8388608, it seems) one by one.
Regarding the number of GPUs: could you clarify what you mean by 24GPU nodes:
Feel free to share you submission script, it this clarifies the question.
Here's the submission script. I was using 6 nodes and 24GPUs in total. I don't think it's a good idea to request a lot of GPU nodes (cost a lot of machine hours and the long wait time). warpx_python2.sh.txt And by the way, it would be great help if you know something about this thread Thanks
Sure. I'm trying that now with 32GPU nodes. As a side question, how much more simulation time does the 3D case compare with the RZ/FBPIC? Also, I used FBPIC to match with another LWFA simulation paper with externally injected beams. However, with the same simulation parameters, the emittance I got cannot match to their emittance evolution (with a large deviation). I'm not sure what's wrong here. Is there anything tricky about computing injected particle emittance in RZ format (in the boosted frame)? (This is why I'm trying the full 3D algorithm in WarpX). I could start a new ticket in FBPIC if you want.
Thanks
Hi @ycaophysics, I agree that, whenever possible, one should use as few GPUs as possible. However, there are cases where it is impossible for the simulation to fit on a small number of GPUs. In your case, I think that, even with only 10432 cells in z, the simulation is much bigger than what would fit on 24 GPUs. More specifically, with WarpX in FDTD mode, you will need roughly 72 bytes per mesh cell and 64 bytes per particles. So, in your case (960 x 960 x 10432 mesh cells, with 1 particle per cell) this will require roughly 1300 GB. Since each GPU has 40 GB, 24 GPUs have 960 GB, which is not enough to hold all the data that you need. In your case (with the new grid size of 960 x 960 x 10432), you would need probably at least 33 GPUs, and - because the above estimate are only approximate and because there are always various overheads in memory usage - I would in fact recommend trying 48 GPUs (i.e. 12 nodes) to be safe.
Regarding your question on FBPIC: yes, please open a ticket on the FBPIC issue tracker, on Github. When doing so: feel free to add more details on the problem that you are encountering, including your full simulation script, as well as plots that demonstrate the discrepancy between what you were expecting and what you observed.
Okay I got it to run on 8 nodes and 32 GPUs. Just asking, how do you specify the analytic distribution of density as a function on PICMI? For example, what I want here:
plasma_dist = picmi.AnalyticDistribution( density_expression = "plasma_density* np.where(z < blank+ramp_up, (z-blank)/ramp_up, np.where(z < blank, 0., np.ones_like(z)))", plasma_density = plasma_density, lower_bound = [plasma_xmin, plasma_ymin, plasma_zmin], upper_bound = [plasma_xmax, plasma_ymax, plasma_zmax])
Or def dens_func( z ) : """Returns relative density at position z and r"""
n = np.ones_like(z)
# Make linear ramp
n = np.where( z<blank+ramp_up, (z-blank)/ramp_up, n )
# Supress density before the ramp
n = np.where( z<blank, 0., n )
return(n)
However it shows error of what(): Unknown character . in Parser expression "plasma_density*np.where(z<blank+ramp_up,(z-blank)/ramp_up,np.where(z<blank,0.,np.ones_like(z)))" SIGABRT
It seems that it can't take in np as package. Is there any way around that? Thanks
Yes, the syntax of the expression parser in WarpX does not use Python. Please have a look at this page for more details on the parser: https://amrex-codes.github.io/amrex/docs_html/Basics.html#parser
Okay thanks. Also it turn out that FBPIC issue I mentioned may not exist (I found the error in that paper). I'll check again to make sure.
Just checking again. The 3D GPU run was fine. However, no checkpoints were written. This is what I have for the checkpoint command: check = picmi.Checkpoint( period=20000, write_dir="./Checkpoint/", warpx_file_prefix = 'checkpoint') Could you upload one comment to the WarpX page about how to write the checkpoint with PICMI properly? Thanks
Additionally, one single day run of the full 3D simulation takes 20TB of my storage on NERSC (with only about 20 dump files) which is all the $SCRACH space I have available. Is there any way to reduce the space requirement while trying the best to dump all the diagnosis? Thanks
Hi. I've encountered a problem of running out of storage using GPUs on NERSC Perlmutter with these scripts (PICMI). I tried to reduce the steps or the number of grids but with no luck. How would you reduce the memory it takes?
Also, is it okay to provide a example script for boosted frame written in PICMI format on the WarpX page?
PICMI_inputs_3d_Svystun copy.py.txt WarpX_3d.e10665951.txt warpx_python.sh.txt
Thank you Yuxuan Cao