Open zhangshixuan1987 opened 1 year ago
@zhangshixuan1987, my first guess would be that this was a glitch of some sort in the Perlmutter file system. I haven't seen a problem like this before that I recall. Could you try rerunning just year 0034 from a restart file and see if the output gets corrected?
Following suggestions from @wlin7 and @xylar, I conducted a "continue run" with the restart files saved at 0034-01-01. The simulation was run for 2 years from 0034-01-01 to 0036-01-01 and the model output was saved. The new generated model output during the 0034-01-01 -- 0036-01-01 was used to replace the old model output files at these periods. Then I rerun the MPASS diagnostics. The kinks at around year 0034-0035 in the figure of ocean heat contents now disappear:
I also checked the historical files regenerated by E3SM for "mpaso.hist.am.timeSeriesStatsMonthly.0034-10-01.nc", and all quantities in this file now have reasonable values rather than "zeros". Therefore, I think @xylar is correct that the issues are likely due to "a glitch of some sort in the Perlmutter file system". However, the reason why such a glitch showed up in my simulation is still not clear to me.
@zhangshixuan1987, I agree, this is mysterious and frustrating. Certainly if it happens again, we need to figure out a way to reproduce it so we can prevent it from happening again. For now, let's hope it's a one-time event!
Adding @ndkeen and @jayeshkrishna to note glitch.
Following suggestions from Wuyin (@wlin7), I also run the "/global/cfs/cdirs/e3sm/tools/cprnc/cprnc" on the file
Overall, the two files are likely bit-for-bit identical, suggesting that the model simulation for other component seems to be not affected.
Just noting that we had a similar-sounding issue a few years ago, but surely it's not the same thing. https://github.com/E3SM-Project/E3SM/issues/4174
With master (Hash: 84e50561a854e1888b0eaa52fc3a44287f3a5924), I've been trying to run a fully coupled simulation with atmospheric nudging to test the impact of the wind forcing over the subpolar North Atlantic on AMOC. The simulation was run on pm-cpu with intel compiler, which is documented on the following confluence page,
In brief,
One error appears when I check the results obtained from the MPASS diagnostics. There is kink appears at around year 0034-0035 as shown in the figure below for the ocean heat contents: Similar issues are also seen in the AMOC timeseries
Further diagnostics indicate that the issues pointed to the model output at 0034-10-01 from mpass-ocean: the output for almost all quantities are zero values in the model historical files (mpaso.hist.am.timeSeriesStatsMonthly.0034-10-01.nc). Only this file has has the issue, the other historical files look correct.
We note that 0034-10-01 was saved in the middle of the simulation, and the model neither crashed nor reported an error during the whole simulation period of 0034-01-01 -- 0043-09-11. Therefore, it seems that this could be potentially a hiccup or a bug related to the i/o infrastructure (in the model, file system, or IO nodes if pm-cpu uses one).
Reported here in case it recurs. For this case, we are going to re-run year 0034 to see if simulation data beyond the problematic month are affected.