Closed yichengt900 closed 3 months ago
Hi Yi-Cheng - thanks for all of your hard work on this. Amazing progress, and I have to agree with Andrew. That Gulf Stream looks amazing! Is that result deterministic or just a lucky spin?
@charliestock I would say it's half-half. The Glorys data nudging likely played a role, but as I recall, we didn't achieve such good results even with nudging in the past?
This PR addresses issue #79. I have conducted several tests on C6, including a 30-year long-term physics-only nudging run for NWA12 (see figures below). Most of those runs completed successfully, although I did experience a few failures in the model's historical data transfer, which required manual resubmission of the output.stage jobs. I have transferred some model input data for NWA12, NWA25, NEP10 (NEP_input and NEP_era5), and ARC12_pub to
/gpfs/f6/ira-cefi/world-shared
. Unfortunately, F6 cannot access F5 directly, so some of you may need to transfer specific forcing data (e.g., JRA forcings) from F5 to F6.Gulf stream position:
Cold water index:
Sea ice extension in Gulf of St. Lawerence:
While we can now run the FRE workflow on C6, a few issues remain:
TheFRE group has fixed the bugs, so now when you log in to C6, please load fre/test with the following commands:default FRE/test
cannot properly handle XML copying from Gaea to GFDL. Additionally, the C6 system lacks thelfs
command, which is used to find and list file names. I've created a custom version of FRE/test,You will need to update your platforms.xml to include ncrc6.intel23, as the default Intel compiler on C6 is now version 2023.2.0.
Also the FRE/test still cannot work appropriately on GFDL PPAN during the PP step. For now, I recommend we continue using FRE/bronx-21 or FRE/bronx-22 on PPAN for the PP process (see the platforms.xml). FRE group fixed FRE/test for PPAN so we can use FRE/test for PP on PPAN now.For some reasons I encountered a crash when running NWA12 using FMS1-intel23 build with mask_table on C6. The FMS2-intel23 build did not have this issue, so I recommend using FMS2 for now. We've resolved the nudging performance issue for FMS2, and it’s being used for other domains, so this shouldn’t be a significant problem.
Lastly, I've noticed some unusual behavior on F6, as well as with data transfers between PPAN and F6 (have opend several helpdesk tickets). Be aware that you might encounter some odd issues when working on F6.
08/13/2024
It looks like regression testing is now working on C6, but
FRE
is still not functioning properly.08/14/2024
Although I haven't heard from MSD (received an email at 5:00 PM, which suggests that
gcp
is now working), I'm making some progress. Thefremake
andfrerun
processes are partially working with a few tweaks (thelfs
command is missing on C6;ncrc6.inc
was missing inhsmget/test
but is now fixed; I had to maintain our own FRE environment to make it work).Additionally, for some reason, the Intel23 FMS1 build didn't work with mask_table, so let's stick with FMS2 for now.
08/15/2024
A one-year NWA test run using FRE on C6 seems to be going well. However, the XML file isn't copying correctly to PPAN during the data transfer step, so PP won't work automatically. I’ll take a look and try to fix it when PPAN is back online.
It appears that the issue may be due to the
patternSedF5
environment variable being set incorrectly in the fre/test. I have a temporary solution, but I will need to rerun a test to confirm. Additionally, data transfer is currently quite slow.The temporal solution seems to work,
but the gcp from fre/test did not work on PPAN. The temporal solution would be using FRE/bronx-22 on PPAN for PP.MSD has fixed this problem after reporting. I will re-run the whole test to make sure it works.1-year run done successfully with PP. Conduct another 5-year run to make sure everything is good and we are ready to go!
08/16/2024
The 5-year run is still in progress. The model simulation appears to be functioning correctly, but I encountered output.stager job failures starting in the second year, along with some unusual filesystem behavior. I believe this PR is ready, but the C6/F6 system may still need some tuning.
CC @charliestock to keep you in the loop.