Open RobHanna-NOAA opened 6 months ago
One thing that might be good to add here is a check to see if fim_process_unit_wb.sh
completes successfully or not before running fim_post_processing.sh
on a HUC. I think the way it is now, fim_post_processing.sh
is run even if fim_process_unit_wb.sh
has failed for that HUC.
Yes... that is true. It does assume that at least one HUC passed. We do try to isolate each fim_process_unit_wb to pass/fail at will so they can be run in any form of a parallelized system (bash parallel or AWS). There is no easy way for a parallized fim_process_unit_wb to tell bash that it passed or failed. While not pretty.. that is why we have bash code that scans log directories looking for HUC errors and puts them in the unit_errors.log file (or similar name)
The overall performance of the post processing steps is pretty slow, even though it has some multi proc in some parts. That section has slowly growing and the cumulate of them is now a bit of a problem. Consider analyzing each tool used in post processing to review performance. I suspect adding multi-threading in parts could have major performance gains. Research required.
Updated July 11:
Not sure if this is an anomaly but the recent fim_4_5_2_5 BED run took 15 hrs 45 min on post processing. And that was on AWS EC2 Prod3 with 40 cores. No wonder AWS Step functions can not handle it anymore. This is complicated by the fact that we can not re-run post processing as files are compromised during each post processing run. An effort is already in place, via 1141 to fix that part.
We should considering having a log file system added to fim_post_processing.sh. It can calculate duration times for each section, and keep writing out to a growing post performance log file as it continues. This might also help us see when it might have died in some sections as well as possible bottle necks. (Aug 2: done)
Update: Aug 6, 2024
Related Issue cards are: