NOAA-GFDL / CEFI-regional-MOM6

A repository containing essential tools, XML files, and source codes for collaborators of the Climate, Ecosystems, and Fisheries Initiative (CEFI) to conduct simulations.
Other
19 stars 16 forks source link

Preparation for Transition to C6 #78

Closed yichengt900 closed 3 months ago

yichengt900 commented 3 months ago

This PR addresses issue #79. I have conducted several tests on C6, including a 30-year long-term physics-only nudging run for NWA12 (see figures below). Most of those runs completed successfully, although I did experience a few failures in the model's historical data transfer, which required manual resubmission of the output.stage jobs. I have transferred some model input data for NWA12, NWA25, NEP10 (NEP_input and NEP_era5), and ARC12_pub to /gpfs/f6/ira-cefi/world-shared. Unfortunately, F6 cannot access F5 directly, so some of you may need to transfer specific forcing data (e.g., JRA forcings) from F5 to F6.

Gulf stream position: gulfstream_eval

Cold water index: coldpool_eval

Sea ice extension in Gulf of St. Lawerence: gsl_extent

While we can now run the FRE workflow on C6, a few issues remain:

08/13/2024

It looks like regression testing is now working on C6, but FRE is still not functioning properly.

08/14/2024

Although I haven't heard from MSD (received an email at 5:00 PM, which suggests that gcp is now working), I'm making some progress. The fremake and frerun processes are partially working with a few tweaks (the lfs command is missing on C6; ncrc6.inc was missing in hsmget/test but is now fixed; I had to maintain our own FRE environment to make it work).

Additionally, for some reason, the Intel23 FMS1 build didn't work with mask_table, so let's stick with FMS2 for now.

08/15/2024

A one-year NWA test run using FRE on C6 seems to be going well. However, the XML file isn't copying correctly to PPAN during the data transfer step, so PP won't work automatically. I’ll take a look and try to fix it when PPAN is back online.

It appears that the issue may be due to the patternSedF5 environment variable being set incorrectly in the fre/test. I have a temporary solution, but I will need to rerun a test to confirm. Additionally, data transfer is currently quite slow.

The temporal solution seems to work, but the gcp from fre/test did not work on PPAN. The temporal solution would be using FRE/bronx-22 on PPAN for PP. MSD has fixed this problem after reporting. I will re-run the whole test to make sure it works.

1-year run done successfully with PP. Conduct another 5-year run to make sure everything is good and we are ready to go!

08/16/2024

The 5-year run is still in progress. The model simulation appears to be functioning correctly, but I encountered output.stager job failures starting in the second year, along with some unusual filesystem behavior. I believe this PR is ready, but the C6/F6 system may still need some tuning.

CC @charliestock to keep you in the loop.

charliestock commented 3 months ago

Hi Yi-Cheng - thanks for all of your hard work on this. Amazing progress, and I have to agree with Andrew. That Gulf Stream looks amazing! Is that result deterministic or just a lucky spin?

yichengt900 commented 3 months ago

@charliestock I would say it's half-half. The Glorys data nudging likely played a role, but as I recall, we didn't achieve such good results even with nudging in the past?