Closed SorooshMani-NOAA closed 1 year ago
More context:
To give more detail about the issues I am experiencing, for the part of the tutorial that involves refining the size function with NWM river shapes, when I ran that part in Frontera, the code was running for about an hour and a half, and after it finished, it did not load the river data correctly. Meanwhile, for the tidal run examples, when running the second example on Frontera, the portion of the code generating the mesh gave a memory allocation error. To reduce the amount of memory required for this portion of the code, I adjusted the size of the circle on which the mesh is generated. The tutorial generates a circle with a radius of 170km, and I was able to generate the mesh after changing the radius of the circle to 75km. I only tested radii of 75km and 100km, so the maximum value for which I am able to set the radius and get the tutorial to run properly is somewhere between those two values. I have attached two pictures below: one of which shows the mesh generated for the smaller circle and the other of which shows the output I obtain from trying to run the river refinement part of the code. In both pictures, the left figure is what I obtained from running the code while the right figure is the output shown in the tutorial.
I'm still profiling to get the memory requirements, but I think I know what the issue with NWM is. When developing the tutorial, I was using NWM version 2.0 dataset. The first layer of that dataset had river reaches. Right now using the provided link in the instructions to the get NWM dataset, you'll get v2.1. The order of layers in this version is different, so the statement gdf_nwm_rivers = gpd.read_file(data/'NWM_v2.0_channel_hydrofabric/nwm_v2_0_hydrofabric.gdb', layer=0)
in example 4 no longer works. That's because it's actually getting the wrong layer.
To fix the issue please use the following line instead. I'll update the Jupyter notebook a bit later.
nwm_gdb = data/'NWM_<YOUR_VERSION>_channel_hydrofabric/nwm_<YOUR_VERSION>_hydrofabric.gdb'
conus_rivers_idx = fiona.listlayers(nwm_gdb).index('nwm_reaches_conus')
gdf_nwm_rivers = gpd.read_file(nwm_gdb, layer=conus_rivers_idx)
The following screen captures show the memory need for some of the operations within the tutorial:
This last one is a little strange, the memory profilers says ~ 14GB
but the system monitor showed ~ 36GB
memory usage during the operation!
In any case all of the above operations were run sequentially on a host with total of 96GB
RAM
Follow up email comm:
After implementing the changes you specified, the code is able to load the river features properly (the plot given before the mesh refinement step is now fixed), but I experience the "Cannot allocate memory" error when attempting to run
hfun_obj_ex4.add_feature(gpd.GeoSeries(rivers_ex4, crs=4326).to_crs(hfun_msh_t_7.crs).unary_union, target_size=50, expansion_rate=0.005)
Here is a picture of the error I'm experiencing.
@BrendanGramp just to make sure we're on the same page, are you using the notebook as it is or was (only updating paths)? Or have you made any changes to the segments of the code before it?
For example are you passing only the river segments that are intersecting the domain?
rivers_ex4 = geom_poly_5.intersection(gdf_nwm_rivers.unary_union)
or do you pass all the NWM rivers to add_feature
?
Solution
In any case, one solution for this issue is passing nprocs
argument to add_feature
with a small value. For example pass it as 8
or 4
or even 1
and see if the issue is resolved or not. This reduces the number of times static data needs to be replicated for multiprocess segment of the code. Maybe the number of processors on the compute node is so high that if the default value is taken (i.e. the number of available processors) it will result in too many duplications compared to the memory available.
Please let me know how if this helps with the issue. Thanks
@SorooshMani-NOAA I have been using the same code as what was given in the tutorial outside of changing the names of the nwm and gebco files to match the files that were downloaded. Also, for the second example generating a larger mesh around NY Harbor, for the command
base_circ_ex2 = geometry.Point([0, 0]).buffer(170e3)
I adjusted the buffer size by making it smaller until I was able to generate a mesh. I was able to do this when using 80e3 as my buffer value. I generally run the entire jupyter notebook up to the river refinement step when testing that part of the code while for the part of the code corresponding to the second example of a mesh in the NY Harbor, I only run the first four code blocks of the Jupyter notebook before jumping to that part. Also, in case it is relevant, I've noticed that other parts of the code can result in a "Cannot allocate memory" error if the error is experienced earlier and I have not restarted the kernel.
I will look into the solution you mentioned.
@SorooshMani-NOAA If I understand this correctly, this would help with the river refinement step, but not with the portion of the tutorial generating a larger mesh for the New York Bay Area, right? Because that is what I experienced. I set nprocs = 1 in the add_feature part of the code and I was able to generate the refined mesh. I am able to pass nprocs = 1 in the .get_multipolygon() step, but I still experience the memory allocation error at the mesh generation step hfun_jig_ex2 = hfun_ex2.msh_t().
So in summary, I can now run the part of the tutorial refining the size function with NWM river shapes without issue, but I am still experiencing issues with the part of the tutorial where a mesh is generated for a larger region around NY harbor.
For non-"collector" size functions, you can pass nprocs
to the add_feature
function, but for collector types, such as in
hfun_ex2 = ocsmesh.Hfun(
hfun_rasters_ex2,
base_shape=base_gs_ex2.unary_union,
base_shape_crs=base_gs_ex2.crs,
hmin=200, hmax=8000,
method='fast')
you can pass it to the constructor, i.e.
hfun_ex2 = ocsmesh.Hfun(
hfun_rasters_ex2,
base_shape=base_gs_ex2.unary_union,
base_shape_crs=base_gs_ex2.crs,
hmin=200, hmax=8000,
method='fast',
nprocs=1
)
If adding nprocs=1
there doesn't help, you can also try changing method='fast'
to be method='exact'
instead.
@SorooshMani-NOAA Adding nprocs = 1
fixed my issue. I'm now able to run each component of the tutorial Jupyter notebook. Thank you so much!
Thanks for your feedback! I'll close this ticket then. Please feel free to reach out (and/or open new tickets) if any issues arise.
Follow up on (emphasis mine):
More specifically the main questions to address are:
and