NOAA-EMC / AQM

GNU General Public License v3.0
3 stars 18 forks source link

To understand the difference between ecflow and rocoto workflow generated input and outputs #85

Closed JianpingHuang-NOAA closed 1 year ago

JianpingHuang-NOAA commented 1 year ago

Both Lin Gan (EIB) and I are testing the same package (exe, j-jobs and ex-scripts etc.) for 20230502 at 00z cycle. But we are seeing difference between both workflow generated NEXUS emission inputs, dyn, phy and aqm.prod files between two runs.

The differences include 1) AACD for the NEXUS_Expt.nc files 2) CO for dyn files 3) AOD for phy files 4) ozone for *aqm.prod.nc files

Lin's ecflow-generated input/output files: /lfs/h2/emc/ptmp/lin.gan/ecflow_aqm/para/com/aqm/v7.0/aqm.20230502/00 Jianping's rocoto-generated input/output files: /lfs/h2/emc/ptmp/jianping.huang/emc.para/com/aqm/v7.0/aqm.20230502/00

@bbakernoaa Can your group check why ACCD is different two workflows-generated NEXUS emission files?

ecflow package is at /lfs/h2/emc/physics/noscrub/jianping.huang/nwdev/packages/aqm.v7.0.71L/ush

@lgannoaa Can you provide your package location here?

bbakernoaa commented 1 year ago

@ytangnoaa

BrianCurtis-NOAA commented 1 year ago

A good starting point to look at might be the environment variables you have set up. UFS had issue if an old env var was lurking around and accidentally got into the code it would yield failures or bad results/outputs.

lgannoaa commented 1 year ago

My package HOMEaqm is: cactus:/lfs/h2/emc/global/noscrub/lin.gan/git/aqm.v7.0.71 log:/lfs/h2/emc/ptmp/lin.gan/ecflow_aqm/para/output/prod/today COM:/lfs/h2/emc/ptmp/lin.gan/ecflow_aqm/para/com/aqm/v7.0

lgannoaa commented 1 year ago

@BrianCurtis-NOAA would you help us to exam the configuration in these two forecast DATA location: /lfs/h2/emc/ptmp/jianping.huang/emc.para/tmp/run_fcst.2023050200 - @JianpingHuang-NOAA run /lfs/h2/emc/stmp/lin.gan/aqm/ecflow_aqm/aqm_forecast_00.2023050200 - my run Thanks

ytangnoaa commented 1 year ago

I saw this issue, which looks related NEXUS process. The emission process for some species, like NO, NO2, PCE, are identical. The difference exists for CO, ALD2, AACD etc, after merging. CO etc in time-splitted emission files are same

bbakernoaa commented 1 year ago

@ytangnoaa is this tide to biogenics?

lgannoaa commented 1 year ago

A comparison between two DATA/RESTART directory show the following file is different: compare_ncfile.py /lfs/h2/emc/stmp/lin.gan/aqm/ecflow_aqm/aqm_forecast_00.2023050200/RESTART/fv_tracer.res.tile1.nc /lfs/h2/emc/ptmp/jianping.huang/emc.para/tmp/run_fcst.2023050200/RESTART/fv_tracer.res.tile1.nc
no2 is different

ytangnoaa commented 1 year ago

@ytangnoaa is this tide to biogenics?

CO has no biogenic source. "CO_ant" in the splited emission files of the two runs are the same.

ytangnoaa commented 1 year ago

@BrianCurtis-NOAA would you help us to exam the configuration in these two forecast DATA location: /lfs/h2/emc/ptmp/jianping.huang/emc.para/tmp/run_fcst.2023050200 - @JianpingHuang-NOAA run /lfs/h2/emc/stmp/lin.gan/aqm/ecflow_aqm/aqm_forecast_00.2023050200 - my run Thanks

lgannoaa where are the corresponding log files?

JianpingHuang-NOAA commented 1 year ago

@lgannoaa

I did rerun for 00z cycle on 20230502, the output files can be found from /lfs/h2/emc/ptmp/jianping.huang/emc.para/com/aqm/v7.0/aqm.20230502/00

And the previous run output files are saved at /lfs/h2/emc/ptmp/jianping.huang/emc.para/com/aqm/v7.0/aqm.20230502_Lin_package/00

They are identical. This means that our results are reproducible

lgannoaa commented 1 year ago

/lfs/h2/emc/ptmp/lin.gan/ecflow_aqm/para/output/prod/today/aqm_nexus_post_split_00.o57118921 (line 2327) Used ${HOMEaqm}/sorc/arl_nexus/utils/python/combine_ant_bio.py utility to create /lfs/h2/emc/ptmp/lin.gan/ecflow_aqm/para/com/aqm/v7.0/aqm.20230502/00/aqm.t00z.NEXUS_Expt.nc. This file is used by forecast job. This file is different in AACD between Jianping and my run. It is possible source that result in the forecast job output different.

On Cactus My nexus_post_split job log: /lfs/h2/emc/ptmp/lin.gan/ecflow_aqm/para/output/prod/today/aqm_nexus_post_split_00.o57118921 My fcst job log: /lfs/h2/emc/ptmp/lin.gan/ecflow_aqm/para/output/prod/today/aqm_forecast_00.o57120862 @JianpingHuang-NOAA Would you please provide your run information here: Jianping nexus_post_split job log: ? Jianping fcst job log: ?

lgannoaa commented 1 year ago

I wrote a driver to test this issue. Using the same forecast DATA directory and only replaced the aqm.t00z.NEXUS_Expt.nc to the one from Jianping's COM location. As expected, the forecast output cmp bit identical between Jianping's run and my driver test output. Therefore, the root cause of the fcst output difference is the aqm.t00z.NEXUS_Expt.nc.

JianpingHuang-NOAA commented 1 year ago

@lgannoaa lfs/h2/emc/ptmp/jianping.huang/emc.para/output/20230502/ nexus_emission_2023050200_s00.id_1683222855.log .... nexus_emission_2023050205_s02.id_1683222855.log nexus_post_split_2023050200.id_1683222855.log run_fcst_2023050200.id_1683222855.log

ytangnoaa commented 1 year ago

I checked the other intermediate files between these two runs:

NEXUS_Expt_combined.nc NEXUS_Expt_pretty.nc

and CO etc are identical. The issue looks caused by the last step "combine_ant_bio.py"

ytangnoaa commented 1 year ago

@ytangnoaa is this tide to biogenics?

Barry, it is indeed caused by the biogenic difference. It is strange that CO emission are affected.

lgannoaa commented 1 year ago

@JianpingHuang-NOAA in checking with your nexus_emission job log. For example the: /lfs/h2/emc/ptmp/jianping.huang/emc.para/output/20230502/nexus_emission_2023050200_s00.id_1683165202.log Compare to my ecflow run log: /lfs/h2/emc/ptmp/lin.gan/ecflow_aqm/para/output/prod/today/aqm_nexus_emission_00_00.o57118629 Looks like your job did not find GFS sfc files in line 1119. My job log line 2348 show it was found in /lfs/h2/emc/stmp/lin.gan/aqm/ecflow_aqm/aqm_nexus_gfs_sfc_00.2023050200

exregional_nexus_emission.sh line 92 require GFS_SFC_INPUT to point to right location where your first job nexus_gfs_sfc linked to.

A check on your nexus_gfs_sfc_2023050200.id_1683165202.log. Those files were found and linked to GFS_SFC_STAGING_DIR=/lfs/h2/emc/ptmp/jianping.huang/emc.para/tmp/nexus_gfs_sfc.2023050200

Please modify your configuration to ensure GFS_SFC_INPUT is assigned as /lfs/h2/emc/ptmp/jianping.huang/emc.para/tmp/nexus_gfs_sfc.2023050200 in your nexus_emission job. Rerun your job and let us know if this action fixed the issue. Thanks

JianpingHuang-NOAA commented 1 year ago

I think this has been resolved. @lgannoaa Do you have any more comments on this?

lgannoaa commented 1 year ago

We may close this ticket. The root cause has been found. Issue resolved.