[21pt] Improve Performance for Post-Processing tools

RobHanna-NOAA commented 6 months ago

The overall performance of the post processing steps is pretty slow, even though it has some multi proc in some parts. That section has slowly growing and the cumulate of them is now a bit of a problem. Consider analyzing each tool used in post processing to review performance. I suspect adding multi-threading in parts could have major performance gains. Research required.

Updated July 11:

Not sure if this is an anomaly but the recent fim_4_5_2_5 BED run took 15 hrs 45 min on post processing. And that was on AWS EC2 Prod3 with 40 cores. No wonder AWS Step functions can not handle it anymore. This is complicated by the fact that we can not re-run post processing as files are compromised during each post processing run. An effort is already in place, via 1141 to fix that part.
We should considering having a log file system added to fim_post_processing.sh. It can calculate duration times for each section, and keep writing out to a growing post performance log file as it continues. This might also help us see when it might have died in some sections as well as possible bottle necks. (Aug 2: done)

Update: Aug 6, 2024

Based on a UAT run, we were able to include post processing again successfully. Aggregate by Hydrotables took 82 mins based on 241 HUCs so there is a way to go, but it is getting better. Considering we only run BED's occasionally, we can decide if this is worth fixing. I will re-check resetting permissions as it took 9 mins and should have been much faster considering it radically drop the number of file it was updating. However.. this was run inside AWS Step functions based on 16 cores, versus recent runs including the BED run on 40 cores and before the through put upgrade to Prod3. Maybe we only run BED post processing again EC2's but for UAT, let AWS Step do it.

Related Issue cards are:

1175 - Post Processing Logs - Change most to log files and little output
1199 - Post processing tools and memory / mp use

mluck commented 6 months ago

One thing that might be good to add here is a check to see if fim_process_unit_wb.sh completes successfully or not before running fim_post_processing.sh on a HUC. I think the way it is now, fim_post_processing.sh is run even if fim_process_unit_wb.sh has failed for that HUC.

RobHanna-NOAA commented 6 months ago

Yes... that is true. It does assume that at least one HUC passed. We do try to isolate each fim_process_unit_wb to pass/fail at will so they can be run in any form of a parallelized system (bash parallel or AWS). There is no easy way for a parallized fim_process_unit_wb to tell bash that it passed or failed. While not pretty.. that is why we have bash code that scans log directories looking for HUC errors and puts them in the unit_errors.log file (or similar name)

NOAA-OWP / inundation-mapping

[21pt] Improve Performance for Post-Processing tools #1095