NOAA-EMC / global-workflow

Global Superstructure/Workflow supporting the Global Forecast System (GFS)
https://global-workflow.readthedocs.io/en/latest
GNU Lesser General Public License v3.0
74 stars 164 forks source link

Create links for GOCART output instead of copying after forecast #1934

Open JessicaMeixner-NOAA opened 10 months ago

JessicaMeixner-NOAA commented 10 months ago

What new functionality do you need?

Would like to be able to link gocart diagnostic files to COM directory instead of copying them at the end like https://github.com/NOAA-EMC/global-workflow/pull/1933 implements.

What are the requirements for the new functionality?

The forecast for S2SWA runs even if you link the gocart*.nc files instead of copying at the end.

Acceptance Criteria

The aerosol team has the diagnostic output they need.

Suggest a solution (optional)

I tried to implement suggestions from @rmontuoro and @bbakernoaa here: https://github.com/jessicameixner-noaa/global-workflow/tree/trygocartfix but the forecast still crashed. The setting updates are in this commit: https://github.com/JessicaMeixner-NOAA/global-workflow/commit/f766fd972c5cecf2159047ffd398fe27026af821

WalterKolczynski-NOAA commented 10 months ago

Was a UFS and/or GOCART issue created that can be linked here? Barring information to the contrary from the developers, it seems like the issue needs to be solved there.

JessicaMeixner-NOAA commented 10 months ago

There was an older GOCART Issues that hopefully @lipan-NOAA or @rmontuoro can point you to - apologies Raffaele I lost the issue that you previously sent.

JessicaMeixner-NOAA commented 10 months ago

UFS issue https://github.com/ufs-community/ufs-weather-model/issues/1955 was created

HenryRWinterbottom commented 6 months ago

What new functionality do you need?

Would like to be able to link gocart diagnostic files to COM directory instead of copying them at the end like #1933 implements.

What are the requirements for the new functionality?

The forecast for S2SWA runs even if you link the gocart*.nc files instead of copying at the end.

Acceptance Criteria

The aerosol team has the diagnostic output they need.

Suggest a solution (optional)

I tried to implement suggestions from @rmontuoro and @bbakernoaa here: https://github.com/jessicameixner-noaa/global-workflow/tree/trygocartfix but the forecast still crashed. The setting updates are in this commit: JessicaMeixner-NOAA@f766fd9

@JessicaMeixner-NOAA I just want to make sure I am understanding what you have said above.

Are you asking for the aerosol history files from the forecast run (i.e., those in DATA) to be linked to the appropriate COMROOT path? If so, that isn't going to work since the corresponding STMP/RUNDIR/fcst.**** is removed following successful completion of the forecast and the links will become hanging links.

If I am misunderstanding and you mean that the previous forecast cycle aerosol history files be linked to the current forecast run, then that is straightforward and minor modification.

Can you please provide me a bit more context and clarity? Thank you in advance.

JessicaMeixner-NOAA commented 6 months ago

@HenryWinterbottom-NOAA here's some extra details which hopefully will be helpful:

Currently what happens:

  1. Run forecast model
  2. Copy gocart*.nc from DATA to COMROOT

The desired path forward is instead:

  1. Link gocart*.nc output files (even though they don't yet exist) from DATA to COM, so files are available as soon as they're created, similar to other component output files
  2. Run forecast model

The desired path forward for whatever reason did not work before. One reason it did not work was that there were some model updates and model settings that need to be set. @lipan-NOAA @bbakernoaa and @rmontuoro are aerosol experts to help with details. Tagging @WalterKolczynski-NOAA as well as I know we both looked at this before. He might be able to help fill in some gaps that I didn't.

bbakernoaa commented 6 months ago

This can be fixed by adding in the flag Allow_Overwrite: .true. in the AERO_HISTORY.rc file in the workflow. This is referenced in https://github.com/GEOS-ESM/GOCART/issues/256

HenryRWinterbottom commented 6 months ago

@JessicaMeixner-NOAA I am not certain that request raised in this issue can be performed based on the current capabilities of the UFS weather model.

As is currently done, the gocart.inst_aod.*.nc4 files are written without symlinks to COMROOT. However, when the symlink is created by ush/forecast_postdet.sh, when the UFS weather model is executed, the following exception is raised when writing the GOCART history files.

pe=00000 FAIL at line=00295    NetCDF4_FileFormatter.F90                <status=-51>
pe=00000 FAIL at line=00066    HistoryCollection.F90                    <status=-51>
pe=00000 FAIL at line=00811    ServerThread.F90                         <status=-51>
pe=00000 FAIL at line=00138    BaseServer.F90                           <status=-51>
pe=00000 FAIL at line=01002    ServerThread.F90                         <status=-51>
pe=00000 FAIL at line=00097    MessageVisitor.F90                       <status=-51>
pe=00000 FAIL at line=00115    AbstractMessage.F90                      <status=-51>
pe=00000 FAIL at line=00107    SimpleSocket.F90                         <status=-51>
pe=00000 FAIL at line=00449    ClientThread.F90                         <status=-51>
pe=00000 FAIL at line=00399    ClientManager.F90                        <status=-51>
pe=00000 FAIL at line=03560    MAPL_HistoryGridComp.F90                 <status=-51>
pe=00000 FAIL at line=01901    MAPL_Generic.F90                         <status=-51>
pe=00000 FAIL at line=01291    MAPL_CapGridComp.F90                     <status=-51>
pe=00000 FAIL at line=01220    MAPL_CapGridComp.F90                     <status=-51>
pe=00000 FAIL at line=01166    MAPL_CapGridComp.F90                     <status=-51>
pe=00000 FAIL at line=00834    MAPL_CapGridComp.F90                     <status=-51>
pe=00000 FAIL at line=00974    MAPL_CapGridComp.F90                     <status=-51>

In addition to the hanging symlink test above, I created empty gocart.inst_aod.*.nc4 files beneath COMROOT via a touch. This raised the same exception. The UFS aerosol model seems to be expecting an existing netCDF-formatted file that can be overwritten, attached to the link, when the history file is to be written.

The relevant contents of my AERO_HISTORY.rc configuration file are as follows.

#######################################################################
#                 Create History List for Output
#######################################################################

VERSION: 1
EXPID:  gocart
EXPDSC: GOCART2g_diagnostics_at_c360
EXPSRC: GEOSgcm-v10.16.0
Allow_Overwrite: .true.

@bbakernoaa Are my conclusions what you'd expect or am I missing something else?

aerorahul commented 6 months ago

@HenryWinterbottom-NOAA I was able to recreate the error when trying to link from DATA to COM. The associated diff is:

 /s/N/g/R/g/global-workflow [develop ✚.1 …6] ❯❯❯ git diff ush/
diff --git i/ush/forecast_postdet.sh w/ush/forecast_postdet.sh
index 644a2180..c9eb9b2d 100755
--- i/ush/forecast_postdet.sh
+++ w/ush/forecast_postdet.sh
@@ -929,33 +929,19 @@ GOCART_rc() {
 GOCART_postdet() {
   echo "SUB ${FUNCNAME[0]}: Linking output data for GOCART"

+  local fhr vdate
   for fhr in ${FV3_OUTPUT_FH}; do
-    local vdate=$(date --utc -d "${current_cycle:0:8} ${current_cycle:8:2} + ${fhr} hours" +%Y%m%d%H)
-
-    # Temporarily delete existing files due to noclobber in GOCART
-    if [[ -e "${COM_CHEM_HISTORY}/gocart.inst_aod.${vdate:0:8}_${vdate:8:2}00z.nc4" ]]; then
-      rm -f "${COM_CHEM_HISTORY}/gocart.inst_aod.${vdate:0:8}_${vdate:8:2}00z.nc4"
-    fi
+    if (( fhr == 0 )); then continue; fi
+    vdate=$(date --utc -d "${current_cycle:0:8} ${current_cycle:8:2} + ${fhr} hours" +%Y%m%d%H)

-    #To Do: Temporarily removing this as this will crash gocart, adding copy statement at the end
-    #${NLN} "${COM_CHEM_HISTORY}/gocart.inst_aod.${vdate:0:8}_${vdate:8:2}00z.nc4" \
-    #       "${DATA}/gocart.inst_aod.${vdate:0:8}_${vdate:8:2}00z.nc4"
+    ${NLN} "${COM_CHEM_HISTORY}/gocart.inst_aod.${vdate:0:8}_${vdate:8:2}00z.nc4" \
+           "${DATA}/gocart.inst_aod.${vdate:0:8}_${vdate:8:2}00z.nc4"
   done
 }

 GOCART_out() {
   echo "SUB ${FUNCNAME[0]}: Copying output data for GOCART"

-  # Copy gocart.inst_aod after the forecast is run (and successfull)
-  # TO DO: this should be linked but there were issues where gocart was crashing if it was linked
-  local fhr
-  local vdate
-  for fhr in ${FV3_OUTPUT_FH}; do
-    if (( fhr == 0 )); then continue; fi
-    vdate=$(date --utc -d "${current_cycle:0:8} ${current_cycle:8:2} + ${fhr} hours" +%Y%m%d%H)
-    ${NCP} "${DATA}/gocart.inst_aod.${vdate:0:8}_${vdate:8:2}00z.nc4" \
-      "${COM_CHEM_HISTORY}/gocart.inst_aod.${vdate:0:8}_${vdate:8:2}00z.nc4"
-  done
 }

 CMEPS_postdet() {

AERO_HISTORY.rc does have Allow_Overwrite: .true.

This indicates to me that the GOCART component is unable to write via links. The component will have to resolve this issue for the workflow to be able to do what is required in this issue.

HenryRWinterbottom commented 6 months ago

@aerorahul Should we icebox or close this for now?

aerorahul commented 6 months ago

icebox till the GOCART group can confirm this is resolved. The fix is very simple as indicated in the diff in this comment

bbakernoaa commented 5 months ago

@junwang-noaa I'm running into this issues as well. Do you remember where to put this in the configuration?

junwang-noaa commented 5 months ago

@bbakernoaa We used to add "Allow_Overwrite: .true." to AERO_HISTORY.rc to resolve the file clobbering issue. But this one looks different to me as it is the symbolic link that does not work.

bbakernoaa commented 5 months ago

@lipan-NOAA could you help look into this?

aerorahul commented 4 months ago

@JessicaMeixner-NOAA @bbakernoaa Is there an issue being tracked for this in the ufs-weather-model, GOCART, or elsewhere?

JessicaMeixner-NOAA commented 4 months ago

@aerorahul It was my understanding that this was already fixed in ufs-waether-model/GOCART, but @bbakernoaa and/or @lipan-NOAA would know more.

aerorahul commented 4 months ago

Is that demonstrated? I have been unable to replicate the successes of @bbakernoaa or @lipan-NOAA

lipan-NOAA commented 4 months ago

@junwang-noaa found that by modifying AERO_HISTORY.rc, the problem was solved #######################################################################

Create History List for Output

#######################################################################

VERSION: 1 EXPID: gocart

To EXPID: ../$(gocart_output_directory)/gocart

aerorahul commented 4 months ago

@lipan-NOAA This needs to be tested, but I doubt this solves the linking to COM/ problem. The issue is the data is written out to COM via a symlink from a location in the runtime directory. I am highly skeptical, this solves the problem. @junwang-noaa's solution provides a way to write output to a sub-directory of DATA/. That output eventually needs to end up in COM which is presently done via symlinks.

junwang-noaa commented 4 months ago

I want to confirm that the symbolic link does not work and it will not be fixed as the fix will break GOES run. So the proposed method is to write to COM directory directly.

yangfanglin commented 4 months ago

For AQMv7 implementation, NCO asked to have all symbolic links removed. Forecast data must be written under the ./DATA (running) directory. At the end of forecast, a separate job was added to move the data to ./COM.

NCO is open to have more discussion about this "new" requirement. @aerorahul EMC needs to discuss with NCO to reach an agreement. It might be impossible to follow this new requirement for large systems like RRFS and GFS/GEFS.

lipan-NOAA commented 4 months ago

tested it and Jun's modification works