Closed rtodling closed 1 month ago
Hi Ricardo. Our GCHM (GEOS-CHEM) colleagues have asked for similar. I thought we had enabled this, but have forgotten the details. I know at the low levels there is a switch, but do not know how/if it propagates from History.rc.
In the worst case a kludge to the interface should not be too hard.
I'm hoping that by "crash" you mean that the model trapped the exception and gave an informative message that it would not overwrite the file? For me that is level-0 and very high priority.
@rtodling See #2391. We have a global setting in History that allows noclobber. It is an open issue to do it per-collection.
Currently, by setting Allow_Overwrite: .true. you can allow every collection to clobber.
Please let us know if you need this per-collection, in which case we can raise the priority of the other ticket. Closing this one.
Indeed, as @tclune says, you'd add to the top of history where the other global variables are:
VERSION: 1
EXPID: f5295_fp
EXPDSC: f5295_fp__GEOSadas-5_29_5__agrid_C720__ogrid_C
EXPSRC: GEOSadas-5_29_5
Allow_Overwrite: .true.
that should allow overwriting of history. But, note it is global so every collection will be allowed to overwrite.
Hi Guys, thanks for the reply on this. I will add the opt to the history. Many thanks.
Ricardo
@rtodling Warning. This might not be working. I'm reopening this issue.
@rtodling Can you tell us what version of MAPL you are using? We might need to go back in time and patch this once we know the fix
Looks like the global option is currently broken. NOAA noticed a problem ...
@lizziel We found that this capability is broken. Those with better memory than me assert that it did work at one time. Raising the priority - might be a 1st where we have 3 different "customers" complaining about the same thing.
@tclune, @weiyuan-jiang , here are some details. The error message we got from UFS weather model:
0: pe=00000 FAIL at line=00187 NetCDF4_FileFormatter.F90 <status=13>
0: pe=00000 FAIL at line=00062 HistoryCollection.F90 <status=13>
0: pe=00000 FAIL at line=00811 ServerThread.F90 <status=13>
0: pe=00000 FAIL at line=00138 BaseServer.F90 <status=13>
0: pe=00000 FAIL at line=01002 ServerThread.F90 <status=13>
0: pe=00000 FAIL at line=00097 MessageVisitor.F90 <status=13>
0: pe=00000 FAIL at line=00115 AbstractMessage.F90 <status=13>
0: pe=00000 FAIL at line=00107 SimpleSocket.F90 <status=13>
0: pe=00000 FAIL at line=00449 ClientThread.F90 <status=13>
0: pe=00000 FAIL at line=00399 ClientManager.F90 <status=13>
0: pe=00000 FAIL at line=03560 MAPL_HistoryGridComp.F90 <status=13>
0: pe=00000 FAIL at line=01901 MAPL_Generic.F90 <status=13>
0: pe=00000 FAIL at line=01291 MAPL_CapGridComp.F90 <status=13>
0: pe=00000 FAIL at line=01220 MAPL_CapGridComp.F90 <status=13>
0: pe=00000 FAIL at line=01166 MAPL_CapGridComp.F90 <status=13>
0: pe=00000 FAIL at line=00834 MAPL_CapGridComp.F90 <status=13>
0: pe=00000 FAIL at line=00974 MAPL_CapGridComp.F90 <status=13>
Please let me know if you want to reproduce the case. So far, the atmosphere can write out files with symbolic link. Before the run:
[Jun.Wang@hfe03 atmaero_control_p8_intel_t1]$ ls -l sfcf000.nc
lrwxrwxrwx 1 Jun.Wang stmp 17 Mar 15 16:45 sfcf000.nc -> output/sfcf000.nc
[Jun.Wang@hfe03 atmaero_control_p8_intel_t1]$ ls -l output/sfcf000.nc
ls: cannot access output/sfcf000.nc: No such file or directory
[Jun.Wang@hfe03 atmaero_control_p8_intel_t1]$ ls -l gocart.inst_aod.20210322_1200z.nc4
lrwxrwxrwx 1 Jun.Wang stmp 41 Mar 15 16:44 gocart.inst_aod.20210322_1200z.nc4 -> output/gocart.inst_aod.20210322_1200z.nc4
[Jun.Wang@hfe03 atmaero_control_p8_intel_t1]$ ls -l output/gocart.inst_aod.20210322_1200z.nc4
ls: cannot access output/gocart.inst_aod.20210322_1200z.nc4: No such file or directory
Then I saw the error message when running the test, and the following in the run directory:
[Jun.Wang@hfe03 atmaero_control_p8_intel_t1]$ ls -l sfcf000.nc
lrwxrwxrwx 1 Jun.Wang stmp 17 Mar 15 16:45 sfcf000.nc -> output/sfcf000.nc
[Jun.Wang@hfe03 atmaero_control_p8_intel_t1]$ ls -l output/sfcf000.nc
-rw-r--r-- 1 Jun.Wang stmp 85452865 Mar 15 17:49 output/sfcf000.nc
[Jun.Wang@hfe03 atmaero_control_p8_intel_t1]$ ls -l gocart.inst_aod.20210322_1200z.nc4
lrwxrwxrwx 1 Jun.Wang stmp 41 Mar 15 16:44 gocart.inst_aod.20210322_1200z.nc4 -> output/gocart.inst_aod.20210322_1200z.nc4
[Jun.Wang@hfe03 atmaero_control_p8_intel_t1]$ ls -l output/gocart.inst_aod.20210322_1200z.nc4
ls: cannot access output/gocart.inst_aod.20210322_1200z.nc4: No such file or directory
Actually, now that I think about it, I think @rtodling is safe. The oddity is occurring because of the broken-symlink style. We are pondering this...
Related issue: https://github.com/GEOS-ESM/MAPL/issues/1620
@junwang-noaa We can only replicate that particular error when the symlink itself is broken. But ... there is a different problem that you will hit once you fix that one.
The history option Allow_Overwrite
does not currently propagate to the server side and fixing that is more subtle than you might have thought. We have scenarios in which a previous segment of a simulation has already written a time slice to a history output file and then the file needs to be appended-to rather overwritten.
This late on Friday this is making my head hurt. On Monday I will work with @bena-nasa to diagram the various cases, what should happen and how to even detect when it should clobber vs append. Sigh.
All, I made a new issue with summarizes what is going on in much more detail. https://github.com/GEOS-ESM/MAPL/issues/2653
This issue has been automatically marked as stale because it has not had activity in the last 60 days. If there are no updates within 7 days, it will be closed. You can add the ":hourglass: Long Term" label to prevent the stale action from closing this issue.
Closing in favor of #2653 (which might be fixed? → @bena-nasa )
At some point in the past a change was made so that MAPL (or CFIO) would crash if trying the model tried to overwrite an output.
I remember that when I first stumbled on this, I wasn't so convinced w/ the need for this. In response to that, a knob what put in (or the default was changed) to allow for overwrite.
I believe the latest version of MAPL now has it such that overwrite causes the model to crash. Can we revisit this again please? This is a very inconvenient future especially for debugging purposes.
Perhaps there is a flag I can add to AGCM.rc or HISTORY to tell MAPL/CFIO not to bother, is there?