NOAA-OWP / inundation-mapping

Flood inundation mapping and evaluation software configured to work with U.S. National Water Model.
Other
91 stars 27 forks source link

[13pt] Abnormal long runtimes for some Alaska HUCs #1123

Open RobHanna-NOAA opened 4 months ago

RobHanna-NOAA commented 4 months ago

During a full BED for fim_4_4_15_0, then entire run time jumped dramatically. This is the first run to include Alaska.

Here are the 20 longest runtimes for the 1,900 plus HUCs. humm.. did we lose some HUCs as that sounds wrong, it should be at least 2,180. I will look into that separately.

The list below, the three columns are: HUC number, time in hh:mm, time as a percent

10110201 60:51 60.85 10200101 60:51 60.85 19020101 60:56 60.93 19020201 63:20 63.33 10300102 64:06 64.1 19020103 66:16 66.26 19020800 68:12 68.2 19020302 70:36 70.6 19020302 71:02 71.3 16060008 73:00 73 19020502 79:35 79.58 19020202 84:07 84.11 19020501 89:30 89.5 19020505 89:47 89.78 18100100 90:32 90.53 19020503 99:35 99.58 19020102 133:07 133.11 19020601 243:10 243.16 19020402 282:06 282.1 19020504 290:04 290.6 19020602 462:52 462.86 19020104 642:57 642.95

The HUC prior to 19020102 seem reasonable, but the others are suspicious, especially that last one which is 10.7 hours.

These are times based on using the AWS Step system, which the fargate machines are set at 6 cores (of 8).

We are going to do more tests on those HUCs on Prod which we will use 42 / 48 (7 x difference to see what we get.

CarsonPruitt-NOAA commented 4 months ago

Rob, I would love to be able to bring this to leadership's attention. Could you maybe make a comparison of the CONUS HUC processing times to the Alaska HUCs? I'm thinking a boxplot like the one below would really convey our message. If you're too busy to play around with the plotting, could you provide me with the timing data?

image

EmilyDeardorff commented 4 months ago

Stream line and stream density of HUC 19020104 (the 10-hour HUC) and surrounding HUCs.

Image Image

RobHanna-NOAA commented 4 months ago

Rob, I would love to be able to bring this to leadership's attention. Could you maybe make a comparison of the CONUS HUC processing times to the Alaska HUCs? I'm thinking a boxplot like the one below would really convey our message. If you're too busy to play around with the plotting, could you provide me with the timing data?

image

If you if into the EFS outputs/fim_4_4_15_0/logs/unit, there is a summary file of runtime for all units. That csv has three columns, huc, runtime in datetime, runtime in time as a percent.

RobHanna-NOAA commented 4 months ago

I want to do some experiments on them using just our prod machine. When we run them on our fargates (big aws runs), those machines are small. About the same size as our regular EC2 with only 8 cores and 64GB ram. Maybe those just need more horsepower. I will see what I can test over the weekend and on Monday to see what we can learn.

RobHanna-NOAA commented 4 months ago

I want to do some experiments on them using just our prod machine. When we run them on our fargates (big aws runs), those machines are small. About the same size as our regular EC2 with only 8 cores and 64GB ram. Maybe those just need more horsepower. I will see what I can test over the weekend and on Monday to see what we can learn.

Stranger yet. The Alaska huc list I ran on Prod was even slower generally speaking (surprisingly). But the trends were the same.

image

RobHanna-NOAA commented 4 months ago

And... the BED failed in post processing, but my test alaska set did not fail in post processing ??? See EFS / outputs.

CarsonPruitt-NOAA commented 4 months ago

Here's the boxplot for the Alaska HUCs vs CONUS+

Image

RobHanna-NOAA commented 4 months ago

Looked into number of branches to time ratio: Pre Alaska: Most branches: Top 5 in order (huc, number of branches) '21010005': 23.00 : 118,   '09030001': 40.56 : 116,   '10300102': 64.10 : 115,   '03130003': 35.30 : 111,   '10140201': 46.66 : 106, Average time is 22 min. (not counting Alaska)


Alaska number of branches

19020203 33.61 10 19020302 45.46 54 19020301 48.10 39 19020101 60.93 85 19020201 63.33 54 19020103 66.26 94 19020800 68.20 4 19020502 79.58 70 19020202 84.11 66 19020501 89.50 98 19020505 89.78 91 19020503 99.58 29 19020102 133.11 106 19020601 243.16 174 19020402 282.10 44 19020504 290.60 130 19020602 462.86 63 19020104 642.95 135 19020401, (timing not yet know), 29

CarsonPruitt-NOAA commented 4 months ago

Based on your branch numbers and this graph, I'm pretty sure that we've narrowed it down to the streamline density / number of catchments as the issue that's slowing AK down.

Image