LarissaReames-NOAA / MPASSIT

Fortran program to interpolate MPAS data to structured grids
GNU General Public License v3.0
7 stars 13 forks source link

Lessen resource requirements for conservative interpolation #6

Closed weather4evr closed 1 year ago

weather4evr commented 1 year ago

I’m not sure if you’ll want to merge this, but I thought I’d pass this along. Conservative interpolation is slower and more memory intensive than other interpolation types. Second-order conservative interpolation, which I’m interested in, is particularly intensive. Turns out that when you interpolate the fields individually, rather than in a bundle, that it dramatically lessens memory use and run time. Don’t know if this same trick is needed for other interpolation types. Also, I don’t understand why the individual field interpolation is less resource intensive than using the bundles.

LarissaReames-NOAA commented 1 year ago

Well, that's certainly bizarre. Do you have some stats or output files that document the memory usage? I'd like to ask the ESMF folks about that -- they've been pretty helpful and responsive in the past so I think we could get an answer about that behavior.

weather4evr commented 1 year ago

Using 360 cores on NCAR's Cheyenne to interpolate 16 fields with 2nd-order conservative interpolation: Using the bundle for interpolation: 419.7 GB of memory, 0.06 hours Not using the bundle for interpolation: 148.4GB of memory, 0.01 hours

Output using the two methods is identical.

LarissaReames-NOAA commented 1 year ago

Wow that's a massive difference. Could you share your method for tracking the memory and timing? I'd like to try it out on other systems and for the other interpolation types so I can give ESMF a full description of what we're seeing.

weather4evr commented 1 year ago

I'm not really well-versed in this sort of thing, so I just used the "qhist" command on Cheyenne: https://arc.ucar.edu/knowledge_base/68878389#ManagingandmonitoringPBSjobs-qhist

The memory is the "approximate total memory usage per job".

Even if the numbers aren't precise, I can tell there is a huge difference. For instance, when not using the bundle, I can interpolate at least 72 2D fields at once (i.e., have 72 fields in histlist_2d), whereas when using the bundle, the code fails (runs out of memory) when trying to interpolate > 20 2D fields.

LarissaReames-NOAA commented 1 year ago

Ah okay so it was the whole job that you tested with just a bunch of 2d fields that you flagged for conservative regridding. Makes sense. I'll try out your fix for the other regridding types on Jet using the SLURM version of qhist.

LarissaReames-NOAA commented 1 year ago

I'll go ahead and merge this PR since it doesn't do anything that can't be reversed by just ignoring the namelist setting. I've found that time and memory does perhaps decrease slightly when not using the bundle regrid for conservative regridding (I tested with 10 fields), but it's not nearly as dramatic a difference as you saw. However, I'm testing it in my updated flexi-grid branch so perhaps that's making a difference. I'll push the updates to the Flexi-grid branch that fix the various bugs so that you can test these updates in that new branch and see if the huge differences still exist.