clawpack / geoclaw

Version of Clawpack for geophysical waves and flows
http://www.clawpack.org/geoclaw
BSD 3-Clause "New" or "Revised" License
76 stars 87 forks source link

How to deal with "free list full with 5000 items" #525

Open tovogt opened 3 years ago

tovogt commented 3 years ago

It would be great if you could give advice what to do when the calculation stops due to the error message: free list full with 5000 items. This typically appears after a lot of ***adjusting timestep for level 7 at t = and together with Note: The following floating-point exceptions are signalling: IEEE_UNDERFLOW_FLAG IEEE_DENORMAL and usually also the following:

 out of bndry space - allowed ***** bndry grids
 There are    15447 total grids    120351 bndry nbors average num/grid      7.791
 Expanding size of boundary list from       120000  to       180000
 out of nodal space - allowed    30000 grids
    level    1 has       1 grids
    level    2 has       4 grids
    level    3 has      16 grids
    level    4 has      64 grids
    level    5 has     256 grids
    level    6 has    1024 grids
    level    7 has   14107 grids
  Could need twice as many grids as on any given
  level if regridding/birecting
 Expanding maximum number of grids from        30000  to        40000

Any ideas what might be causing this or how to deal with this? Thanks in advance!

mandli commented 3 years ago

This is pretty rare to see, more often we run of grids before we run out of boundary space. Given that you are seeing underflows as well I am wondering if something is blowing up instead. Have you tried plotting the results up to this time and see what might be going on?

tovogt commented 3 years ago

Thanks for your response!

I now think that this is somehow related to a very irregular bathymetry. This problem occurs when a TC track crosses over the Bahamas Archipel region (bathymetry from SRTM15+V2.0): image Even for rather medium-strength storms, GeoClaw tends to produce pretty large waves in this region. For example Hurricane Jeanne (2004): image I will be on vacation for a week now, and come back to this at the beginning of October with plots of the GeoClaw wind fields, AMR regions and surface. See you!

mandli commented 3 years ago

The Bahamas are difficult to say the least. Let's pick it up after you get back then.

tovogt commented 2 years ago

I just wanted to report back that this is still coming up from time to time and not only for the Bahamas (as suggested above). Most of the time, this is not a problem. But recently, I was experimenting with higher resolution runs (up to 30m) with synthetic scenarios where the wind speeds reach 290 km/h and found that I would run into the "free list full with 5000 items". Reproducing this is nasty, since it will run for 5 days (!) before the run fails.

In those cases I was running modified versions of Cyclone Idai (2019), but only its final landfall: image The bathymetry looks totally harmless in principal image

As I said, I can't easily rerun this and get arbitrary plot outputs from the run because it takes more than 5 days to run it. I just wanted to paste this here for future reference.

mjberger commented 2 years ago

maybe this data structure should have the auto-increase too.

On May 30, 2022, at 10:45 AM, Thomas Vogt @.***> wrote:

I just wanted to report back that this is still coming up from time to time and not only for the Bahamas (as suggested above). Most of the time, this is not a problem. But recently, I was experimenting with higher resolution runs (up to 30m) with synthetic scenarios where the wind speeds reach 290 km/h and found that I would run into the "free list full with 5000 items". Reproducing this is nasty, since it will run for 5 days (!) before the run fails.

In those cases I was running modified versions of Cyclone Idai (2019) http://ibtracs.unca.edu/index.php?name=v04r00-2019063S18038, but only its final landfall: https://user-images.githubusercontent.com/57705593/171014737-df61b43c-775d-4dd6-9369-9491f3f207cf.png The bathymetry looks totally harmless in principal https://user-images.githubusercontent.com/57705593/171015710-8043e0c8-4690-4901-853b-ff59c178252b.png As I said, I can't easily rerun this and get arbitrary plot outputs from the run because it takes more than 5 days to run it. I just wanted to paste this here for future reference.

— Reply to this email directly, view it on GitHub https://github.com/clawpack/geoclaw/issues/525#issuecomment-1141239559, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAGUGC3Y2GLKY3RFQGQO4YDVMTIBNANCNFSM5EQ3GEJA. You are receiving this because you are subscribed to this thread.

mandli commented 2 years ago

@mjberger that would not be too hard to do I suppose.

tovogt commented 9 months ago

Unfortunately, this still occurs from time to time. In a current setup, we would like to model storm surge of Super Typhoon Haiyan (2013) at up to 9 arc-seconds resolution in GeoClaw, but the run fails after more than 24 hours of run time with "free list full with 5000 items": image Note that we are only modeling a 48-hour subset of the storm duration: image How could we implement auto-increase for this data structure?

mjberger commented 9 months ago

Thomas,

I am putting your request on to-do my list.

In the mean time, I suggest you change the parameter in amr_module.f90 (which is used by geoclaw) from its initial value of 5000 to 25000 (or more if you want). It is only a 1d array, so not a big deal to make it larger than absolutely necessary. After you make this change, you will have to type "make new" to make sure everything is recompiled with the new module data.

— Marsha

On Feb 12, 2024, at 7:34 AM, Thomas Vogt @.***> wrote:

Unfortunately, this still occurs from time to time. In a current setup, we would like to model storm surge of Super Typhoon Haiyan (2013) https://ncics.org/ibtracs/index.php?name=v04r00-2013306N07162 at up to 9 arc-seconds resolution in GeoClaw, but the run fails after more than 24 hours of run time with "free list full with 5000 items": image.png (view on web) https://github.com/clawpack/geoclaw/assets/57705593/bb2bfbeb-e565-4406-b03e-3e691a306d3d Note that we are only modeling a 48-hour subset of the storm duration: image.png (view on web) https://github.com/clawpack/geoclaw/assets/57705593/c2e1e99f-d1dc-4389-9954-d462a99bc8c1 How could we implement auto-increase for this data structure?

— Reply to this email directly, view it on GitHub https://github.com/clawpack/geoclaw/issues/525#issuecomment-1938596347, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAGUGC7AON6K2HNBWYDIM7DYTID4JAVCNFSM5EQ3GEJKU5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TCOJTHA2TSNRTGQ3Q. You are receiving this because you were mentioned.

tovogt commented 9 months ago

Thanks for the quick response, Marsha! The code is now running with the lfdim variable increased to 25000. :) I will report back whether it still fails.

tovogt commented 9 months ago

Okay, I now produced some more time-dependent plots to understand what's going on: fig1001 fig1005

Even though this is a pure surge-driven scenario, it seems like there is an earthquake at the bottom boundary somewhere around hour "+10". What causes this and how can I prevent it?

I already enforce that outside of the inner rectangle (that you see in frame 1) refinement is limited to at most level 4 (of 6). If the refinement at the boundary is causing this, would it help to restrict to refinement level 1 in a small strip around the boundary area? Here is my regions.data:

5                    =: num_regions         
1  4 -4.32000000000000e+04  1.40400000000000e+05  1.05831993103027e+02  1.40850708007812e+02  9.86385345458984e-02  2.24680137634277e+01  
3  6 -4.32000000000000e+04  1.40400000000000e+05  1.15847679138184e+02  1.30129470825195e+02  8.31493663787842e+00  1.43523235321045e+01  
5  6 -4.32000000000000e+04  1.40400000000000e+05  1.18690833333333e+02  1.21242500000000e+02  9.57416666666733e+00  1.40425000000004e+01  
5  6 -4.32000000000000e+04  1.40400000000000e+05  1.21257500000000e+02  1.23792499999999e+02  9.10750000000069e+00  1.38425000000004e+01  
5  6 -4.32000000000000e+04  1.40400000000000e+05  1.23807499999999e+02  1.26342499999999e+02  8.64083333333405e+00  1.33091666666671e+01  
mjberger commented 9 months ago

It looks like a bug from this distance, because nothing should be coming in from the boundary according to what you say.

Is your example checked in so I can look at it further? Or send me a zip file and how to run it, so I can see what's happening?

— Marsha

On Feb 15, 2024, at 6:07 AM, Thomas Vogt @.***> wrote:

Okay, I now produced some more time-dependent plots to understand what's going on: fig1001.gif (view on web) https://github.com/clawpack/geoclaw/assets/57705593/7906050b-cbd6-4736-8bf0-f7ab1e217d9a fig1005.gif (view on web) https://github.com/clawpack/geoclaw/assets/57705593/815e0dd9-98ff-42d7-a8e3-4e5f15f505b7 Even though this is a pure surge-driven scenario, it seems like there is an earthquake at the bottom boundary somewhere around hour "+10". What causes this and how can I prevent it?

I already enforce that outside of the inner rectangle (that you see in frame 1) refinement is limited to at most level 4 (of 6). If the refinement at the boundary is causing this, would it help to restrict to refinement level 1 in a small strip around the boundary area? Here is my regions.data:

5 =: num_regions
1 4 -4.32000000000000e+04 1.40400000000000e+05 1.05831993103027e+02 1.40850708007812e+02 9.86385345458984e-02 2.24680137634277e+01
3 6 -4.32000000000000e+04 1.40400000000000e+05 1.15847679138184e+02 1.30129470825195e+02 8.31493663787842e+00 1.43523235321045e+01
5 6 -4.32000000000000e+04 1.40400000000000e+05 1.18690833333333e+02 1.21242500000000e+02 9.57416666666733e+00 1.40425000000004e+01
5 6 -4.32000000000000e+04 1.40400000000000e+05 1.21257500000000e+02 1.23792499999999e+02 9.10750000000069e+00 1.38425000000004e+01
5 6 -4.32000000000000e+04 1.40400000000000e+05 1.23807499999999e+02 1.26342499999999e+02 8.64083333333405e+00 1.33091666666671e+01
— Reply to this email directly, view it on GitHub https://github.com/clawpack/geoclaw/issues/525#issuecomment-1945866135, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAGUGC7WTMKJSAV4LR7FQ2TYTXT5ZAVCNFSM5EQ3GEJKU5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TCOJUGU4DMNRRGM2Q. You are receiving this because you were mentioned.

tovogt commented 9 months ago

Thanks for your quick response!

Here is the complete set of input data that I used to run the example (with make .output): https://www.pik-potsdam.de/~tovogt/for_mjberger/2013306N07162_60as_2024-02-16.zip (This archive also includes a file stdout.log with the full log output of GeoClaw!) I use clawpack version 5.9.2 and gfortran version 13.2.0, and I run the example on a single HPC cluster node with 16 CPUs and 64 GB of RAM. Running this example on that setup requires more than 6 hours (wall time).

Here I uploaded the complete _plots directory: https://www.pik-potsdam.de/~tovogt/for_mjberger/2013306N07162_60as_plots_2024-02-16.zip

And here is the _output directory: https://www.pik-potsdam.de/~tovogt/for_mjberger/2013306N07162_60as_output_2024-02-16.zip (5.9 GB of data!)

Note that this is not the exact same setup that caused the "free list full with 5000 items" error message for me (this post: https://github.com/clawpack/geoclaw/issues/525#issuecomment-1938596347). That one has exceedingly long run times (more than 24 hours), so I reduced the resolution a bit, but left everything else unchanged.

tovogt commented 9 months ago

Oh, and I can now answer my question whether this is caused by refinement at the boundary. I enforced that there is no refinement at the boundary and the problem persists: fig1001 Maybe this has to do with the bottom boundary being very close to the equator, and there is some kind of division by zero or so?

mjberger commented 9 months ago

Something appears to be coming in from the boundary - I'll have to take a closer look.

Is what you previously sent the coarsest resolution that demonstrates the problem?

— Marsha

On Feb 16, 2024, at 9:46 AM, Thomas Vogt @.***> wrote:

Oh, and I can now answer my question whether this is caused by refinement at the boundary. I enforced that there is no refinement at the boundary and the problem persists: fig1001.gif (view on web) https://github.com/clawpack/geoclaw/assets/57705593/5937a2e5-fd7f-4129-b63a-7b26c4897d46 Maybe this has to do with the bottom boundary being very close to the equator, and there is some kind of division by zero or so?

— Reply to this email directly, view it on GitHub https://github.com/clawpack/geoclaw/issues/525#issuecomment-1948518217, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAGUGC2A74QY4W7ZVDZXSYDYT5WOBAVCNFSM5EQ3GEJKU5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TCOJUHA2TCOBSGE3Q. You are receiving this because you were mentioned.

mjberger commented 9 months ago

Do you have the setrun.py and setplot.py that generated the data files?

What is your direct email so we don't bring the whole group in for this.

— Marsha

On Feb 16, 2024, at 9:27 AM, Thomas Vogt @.***> wrote:

Thanks for your quick response!

Here is the complete set of input data that I used to run the example (with make .output): https://www.pik-potsdam.de/~tovogt/for_mjberger/2013306N07162_60as_2024-02-16.zip (This archive also includes a file stdout.log with the full log output of GeoClaw!) I use clawpack version 5.9.2 and gfortran version 13.2.0, and I run the example on a single HPC cluster node with 16 CPUs and 64 GB of RAM. Running this example on that setup requires more than 6 hours (wall time).

Here I uploaded the complete _plots directory: https://www.pik-potsdam.de/~tovogt/for_mjberger/2013306N07162_60as_plots_2024-02-16.zip

And here is the _output directory: https://www.pik-potsdam.de/~tovogt/for_mjberger/2013306N07162_60as_output_2024-02-16.zip (5.9 GB of data!)

Note that this is not the exact same setup that caused the "free list full with 5000 items" error message for me (this post: #525 (comment) https://github.com/clawpack/geoclaw/issues/525#issuecomment-1938596347). That one has exceedingly long run times (more than 24 hours), so I reduced the resolution a bit, but left everything else unchanged.

— Reply to this email directly, view it on GitHub https://github.com/clawpack/geoclaw/issues/525#issuecomment-1948480931, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAGUGCZDI4KAG46GRIX2XHDYT5UGBAVCNFSM5EQ3GEJKU5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TCOJUHA2DQMBZGMYQ. You are receiving this because you were mentioned.