E3SM-Project / zppy

E3SM post-processing toolchain
BSD 3-Clause "New" or "Revised" License
6 stars 15 forks source link

Remove cdscan for e3sm_diags #598

Open forsyth2 opened 4 months ago

forsyth2 commented 4 months ago

Resolves #346. Resolves #80. Follow-up to #519.

forsyth2 commented 4 months ago

@tomvothecoder Re: https://github.com/E3SM-Project/zppy/pull/519#issuecomment-2127712530 -- this PR includes the changes that couldn't be included in #519 since the e3sm_diags refactor isn't finished yet. My initial testing suggests this PR's implementation will be sufficient to fully remove CDAT dependencies from zppy. But we'll have to wait to merge it until after the e3sm_diags refactor is done.

forsyth2 commented 1 month ago

@tomvothecoder I created a new "min case" cfg for testing sets that have already been migrated. They appear to generate alright using E3SM Diags built off the latest CDAT branch: https://web.lcrc.anl.gov/public/e3sm/diagnostic_output/ac.forsyth2/zppy_min_case_e3sm_diags_cdat_migrated_www/issue-346-diags-20240802/v3.LR.historical_0051/e3sm_diags/

forsyth2 commented 2 days ago

I ran zppy -c tests/integration/generated/test_min_case_e3sm_diags_cdat_migrated_sets_chrysalis.cfg.

The cfg ``` [default] case = "v3.LR.historical_0051" constraint = "" dry_run = "False" environment_commands = "" input = /lcrc/group/e3sm2/ac.wlin/E3SMv3/v3.LR.historical_0051 input_subdir = archive/atm/hist mapping_file = "map_ne30pg2_to_cmip6_180x360_aave.20200201.nc" output = "/lcrc/group/e3sm/ac.forsyth2/zppy_min_case_e3sm_diags_cdat_migrated_output/test-diags-no-cdat-20240917/v3.LR.historical_0051" partition = "debug" qos = "regular" www = "/lcrc/group/e3sm/public_html/diagnostic_output/ac.forsyth2/zppy_min_case_e3sm_diags_cdat_migrated_www/test-diags-no-cdat-20240917" years = "1985:1989:2", [climo] active = True walltime = "00:30:00" [[ atm_monthly_180x360_aave ]] frequency = "monthly" input_files = "eam.h0" input_subdir = "archive/atm/hist" vars = "" [[ atm_monthly_diurnal_8xdaily_180x360_aave ]] frequency = "diurnal_8xdaily" input_files = "eam.h3" input_subdir = "archive/atm/hist" vars = "PRECT" [[ land_monthly_climo ]] frequency = "monthly" input_files = "elm.h0" input_subdir = "archive/lnd/hist" mapping_file = "map_r05_to_cmip6_180x360_aave.20231110.nc" vars = "" [ts] active = True e3sm_to_cmip_environment_commands = "" walltime = "00:30:00" [[ atm_monthly_180x360_aave ]] frequency = "monthly" input_files = "eam.h0" input_subdir = "archive/atm/hist" ts_fmt = "cmip" [[ rof_monthly ]] extra_vars = 'areatotal2' frequency = "monthly" input_files = "mosart.h0" input_subdir = "archive/rof/hist" mapping_file = "" vars = "RIVER_DISCHARGE_OVER_LAND_LIQ" [[ land_monthly ]] extra_vars = "landfrac" frequency = "monthly" input_files = "elm.h0" input_subdir = "archive/lnd/hist" mapping_file = "map_r05_to_cmip6_180x360_aave.20231110.nc" ts_fmt = "cmip" vars = "FSH,RH2M,LAISHA,LAISUN,QINTR,QOVER,QRUNOFF,QSOIL,QVEGE,QVEGT,SOILICE,SOILLIQ,SOILWATER_10CM,TSA,TSOI,H2OSNO,TOTLITC,CWDC,SOIL1C,SOIL2C,SOIL3C,SOIL4C,WOOD_HARVESTC,TOTVEGC,NBP,GPP,AR,HR" [tc_analysis] active = True scratch = "/lcrc/globalscratch/ac.forsyth2/zppy_weekly_comprehensive_v3_scratch/test-diags-no-cdat-20240917/v3.LR.historical_0051" walltime = "00:30:00" [e3sm_diags] active = True environment_commands = "source /home/ac.forsyth2/miniconda3/etc/profile.d/conda.sh; conda activate e3sm_diags_cdat_branch_20240916" grid = '180x360_aave' multiprocessing = True num_workers = 8 partition = "compute" qos = "regular" ref_final_yr = 1986 ref_start_yr = 1985 ref_years = "1985-1986", # Also lat_lon_land for mvm #sets="lat_lon","lat_lon_river","cosp_histogram","zonal_mean_xy","zonal_mean_2d","zonal_mean_2d_stratosphere","aerosol_aeronet","polar","meridional_mean_2d","area_mean_time_series","aerosol_budget","annual_cycle_zonal_mean","diurnal_cycle","tc_analysis","enso_diags", "streamflow" sets="lat_lon","tc_analysis","enso_diags","streamflow", short_name = "v3.LR.historical_0051" ts_num_years = 2 walltime = "5:00:00" years = "1987:1989:2" [[ atm_monthly_180x360_aave ]] climo_diurnal_frequency = "diurnal_8xdaily" climo_diurnal_subsection = "atm_monthly_diurnal_8xdaily_180x360_aave" climo_subsection = "atm_monthly_180x360_aave" dc_obs_climo = '/lcrc/group/e3sm/public_html/e3sm_diags_test_data/unit_test_complete_run/obs/climatology' [[ atm_monthly_180x360_aave_mvm ]] # Test model-vs-model using the same files as the reference climo_diurnal_frequency = "diurnal_8xdaily" climo_diurnal_subsection = "atm_monthly_diurnal_8xdaily_180x360_aave" climo_subsection = "atm_monthly_180x360_aave" diff_title = "Difference" partition = "compute" qos = "regular" ref_name = "v3.LR.historical_0051" reference_data_path = "/lcrc/group/e3sm/ac.forsyth2/zppy_min_case_e3sm_diags_cdat_migrated_output/test-diags-no-cdat-20240917/v3.LR.historical_0051/post/atm/180x360_aave/clim" run_type = "model_vs_model" short_ref_name = "same simulation" swap_test_ref = False tag = "model_vs_model" ts_num_years_ref = 2 ts_subsection = "atm_monthly_180x360_aave" # [[ lnd_monthly_mvm_lnd ]] # # Test model-vs-model using the same files as the reference # climo_subsection = "land_monthly_climo" # diff_title = "Difference" # partition = "compute" # qos = "regular" # ref_name = "v3.LR.historical_0051" # reference_data_path = "/lcrc/group/e3sm/ac.forsyth2/zppy_min_case_e3sm_diags_cdat_migrated_output/test-diags-no-cdat-20240917/v3.LR.historical_0051/post/lnd/180x360_aave/clim" # run_type = "model_vs_model" # sets = "lat_lon_land", # short_ref_name = "same simulation" # swap_test_ref = False # tag = "model_vs_model" # ts_num_years_ref = 2 ```

I specifically set:

sets="lat_lon","tc_analysis","enso_diags","streamflow",

to test the latest sets added on https://github.com/E3SM-Project/e3sm_diags/commits/cdat-migration-fy24 (apparently missing qbo as a recent set). [Note I still included lat_lon as a set since I recall E3SM Diags sometimes running into problems if that set isn't run first.]

These three sets unfortunately ran into some errors. https://web.lcrc.anl.gov/public/e3sm/diagnostic_output/ac.forsyth2/zppy_min_case_e3sm_diags_cdat_migrated_www/test-diags-no-cdat-20240917/v3.LR.historical_0051/e3sm_diags/atm_monthly_180x360_aave/model_vs_obs_1987-1988/viewer/ only shows lat_lon. Now, by #625, apparently tc_analysis wasn't even showing up correctly on main. But that still means enso_diags and streamflow ran into some issues on the migrated branch.

$ cd /lcrc/group/e3sm/ac.forsyth2/zppy_min_case_e3sm_diags_cdat_migrated_output/test-diags-no-cdat-20240917/v3.LR.historical_0051/post/scripts

$ grep -v "OK" *status
# No outright errors

$ grep -in tc_analysis e3sm_diags_atm_monthly_180x360_aave_model_vs_obs_1987-1988.o585928
$ grep -in enso_diags e3sm_diags_atm_monthly_180x360_aave_model_vs_obs_1987-1988.o585928
$ grep -in streamflow e3sm_diags_atm_monthly_180x360_aave_model_vs_obs_1987-1988.o585928

tc_analysis:

IndexError: list index out of range

This error matches the one on main, as noted in #625

enso_diags:

OSError: No files found for target variable TAUX or derived variables ([('tauu',), ('surf_mom_flux_\
U',)]) in ts.
OSError: No files found for target variable LHFLX or derived variables ([('hfls',), ('QFLX',), ('su\
rface_upward_latent_heat_flux',)]) in ts.
OSError: No files found for target variable SHFLX or derived variables ([('hfss',), ('surf_sens_flu\
x',)]) in ts
OSError: No files found for target variable LHFLX or derived variables ([('hfls',), ('QFLX',), ('su\
rface_upward_latent_heat_flux',)]) in ts.
OSError: No files found for target variable SHFLX or derived variables ([('hfss',), ('surf_sens_flu\
x',)]) in ts.
OSError: No files found for target variable NET_FLUX_SRF or derived variables ([('FSNS', 'FLNS', 'Q\
FLX', 'PRECC', 'PRECL', 'PRECSC', 'PRECSL', 'SHFLX'), ('FSNS', 'FLNS', 'LHFLX', 'SHFLX'), ('FSNS', \
'FLNS', 'QFLX', 'SHFLX'), ('rsds', 'rsus', 'rlds', 'rlus', 'hfls', 'hfss')]) in ts.
OSError: No files found for target variable PRECT or derived variables ([('PRECT',), ('pr',), ('PRE\
CC', 'PRECL'), ('sat_gauge_precip',), ('PrecipLiqSurfMassFlux', 'PrecipIceSurfMassFlux')]) in ts.
OSError: No files found for target variable TAUX or derived variables ([('tauu',), ('surf_mom_flux_\
U',)]) in ts.
OSError: No files found for target variable TAUY or derived variables ([('tauv',), ('surf_mom_flux_\
V',)]) in ts.
OSError: No files found for target variable FSNS or derived variables ([('sfc_net_sw_all_mon',), ('\
rsds', 'rsus')]) in ts.
OSError: No files found for target variable NET_FLUX_SRF or derived variables ([('FSNS', 'FLNS', 'Q\
FLX', 'PRECC', 'PRECL', 'PRECSC', 'PRECSL', 'SHFLX'), ('FSNS', 'FLNS', 'LHFLX', 'SHFLX'), ('FSNS', \
'FLNS', 'QFLX', 'SHFLX'), ('rsds', 'rsus', 'rlds', 'rlus', 'hfls', 'hfss')]) in ts.

streamflow:

OSError: No time series `.nc` file was found for 'RIVER_DISCHARGE_OVER_LAND_LIQ' in 'rof'

The enso_diags and streamflow errors seem to suggest missing data, but this confuses me because the comprehensive_v3 test uses the same simulation output as its data source...

forsyth2 commented 1 day ago

@chengzhuzhang @tomvothecoder

https://web.lcrc.anl.gov/public/e3sm/diagnostic_output/ac.forsyth2/zppy_min_case_e3sm_diags_cdat_migrated_www/test-diags-no-cdat-20240917/v3.LR.historical_0051/e3sm_diags/atm_monthly_180x360_aave/model_vs_obs_1987-1988/viewer/ shows only lat_lon despite tests/integration/template_min_case_e3sm_diags_cdat_migrated_sets.cfg having sets="lat_lon","tc_analysis","enso_diags","streamflow",.

The relevant output directory can be seen here, and is also mentioned in a code block above.

$ cd /lcrc/group/e3sm/ac.forsyth2/zppy_min_case_e3sm_diags_cdat_migrated_output/test-diags-no-cdat-20240917/v3.LR.historical_0051/post/scripts

$ grep -v "OK" *status
# No obvious failures

# Possibly useful commands:
$ grep -in error e3sm_diags_atm_monthly_180x360_aave_model_vs_obs_1987-1988.o585928
$ grep -in tc_analysis e3sm_diags_atm_monthly_180x360_aave_model_vs_obs_1987-1988.o585928
$ grep -in enso_diags e3sm_diags_atm_monthly_180x360_aave_model_vs_obs_1987-1988.o585928
$ grep -in streamflow e3sm_diags_atm_monthly_180x360_aave_model_vs_obs_1987-1988.o585928

Note https://web.lcrc.anl.gov/public/e3sm/diagnostic_output/ac.forsyth2/zppy_min_case_e3sm_diags_cdat_migrated_www/test-diags-no-cdat-20240917/v3.LR.historical_0051/e3sm_diags/atm_monthly_180x360_aave_mvm/model_vs_model_1987-1988/viewer/ also shows only lat_lon in the final viewer.

chengzhuzhang commented 1 day ago

@forsyth2 I'm looking at the log files (as below), it appears that time series files are not correctly feed into e3sm_diags, which means the code change that removes cdscan step is not working with e3sm_diags s new code base correctly.

>cp: cannot create regular file 'ts/FSNTOA_1987*.nc': No such file or directory
cp: cannot create regular file 'ts/FLUT_1987*.nc': No such file or directory
cp: cannot create regular file 'ts/FSNT_1987*.nc': No such file or directory
cp: cannot create regular file 'ts/FLNT_1987*.nc': No such file or directory
cp: cannot create regular file 'ts/FSNS_1987*.nc': No such file or directory
cp: cannot create regular file 'ts/FLNS_1987*.nc': No such file or directory
cp: cannot create regular file 'ts/SHFLX_1987*.nc': No such file or directory
cp: cannot create regular file 'ts/QFLX_1987*.nc': No such file or directory
cp: cannot create regular file 'ts/TAUX_1987*.nc': No such file or directory
cp: cannot create regular file 'ts/TAUY_1987*.nc': No such file or directory
cp: cannot create regular file 'ts/PRECC_1987*.nc': No such file or directory
cp: cannot create regular file 'ts/PRECL_1987*.nc': No such file or directory
cp: cannot create regular file 'ts/PRECSC_1987*.nc': No such file or directory
cp: cannot create regular file 'ts/PRECSL_1987*.nc': No such file or directory
cp: cannot create regular file 'ts/TS_1987*.nc': No such file or directory
cp: cannot create regular file 'ts/TREFHT_1987*.nc': No such file or directory
cp: cannot create regular file 'ts/CLDTOT_1987*.nc': No such file or directory
cp: cannot create regular file 'ts/CLDHGH_1987*.nc': No such file or directory
cp: cannot create regular file 'ts/CLDMED_1987*.nc': No such file or directory
cp: cannot create regular file 'ts/CLDLOW_1987*.nc': No such file or directory
cp: cannot create regular file 'ts/U_1987*.nc': No such file or directory
cp: target 'RIVER_DISCHARGE_OVER_LAND_LIQ_*.nc' is not a directory
forsyth2 commented 1 day ago

Thanks @chengzhuzhang. I haven't had problems with previous added sets though. I'll try to debug this further to see if it's something with how this PR handles these sets in particular...

chengzhuzhang commented 1 day ago

@tomvothecoder we will need to coordinate on this issue with e3sm_diags refactoring.

The original cdscan codes basically subset and extract the metadata from input file for the specified time period and generate an xml file with file name as T_startTime_endTime.xml which cdms2 is able to read and process.

In the new e3sm_diags code, for time-series, we use xarray.open_dataset, and the input needs to be one single .nc file. I think we may need to update to xarray.open_mfdataset (https://github.com/E3SM-Project/e3sm_diags/blob/7b53932d72ef7782818a5bd1e4e9d527420fec49/e3sm_diags/driver/utils/dataset_xr.py#L1035)which allows multiple files, and also to support subset time based on start and end years (i think the later is already supported by https://github.com/E3SM-Project/e3sm_diags/blob/7b53932d72ef7782818a5bd1e4e9d527420fec49/e3sm_diags/driver/utils/dataset_xr.py#L1185 ?).

chengzhuzhang commented 1 day ago

Thanks @chengzhuzhang. I haven't had problems with previous added sets though. I'll try to debug this further to see if it's something with how this PR handles these sets in particular...

Could you clarify "the previous added sets", i think the issue only relevant for time-series based diagnostics sets. For sets depends on climo datasets, they should be fine.

chengzhuzhang commented 1 day ago

Thanks @chengzhuzhang. I haven't had problems with previous added sets though. I'll try to debug this further to see if it's something with how this PR handles these sets in particular...

Could you clarify "the previous added sets", i think the issue only relevant for time-series based diagnostics sets. For sets depends on climo datasets, they should be fine.

@forsyth2 I saw the test results from Aug 6, yes by then those sets only rely on climo files.

forsyth2 commented 1 day ago

@chengzhuzhang Does this mean the bug is for certain in the Diags code or should I dive deeper into the zppy changes in this PR?

chengzhuzhang commented 1 day ago

@chengzhuzhang Does this mean the bug is for certain in the Diags code or should I dive deeper into the zppy changes in this PR?

It is not a bug in diags. We can also add the time selecting and slicing code in zppy to be able to remove cdscan. We should have a discussion with @tomvothecoder on how best to approach this.

tomvothecoder commented 1 day ago

@chengzhuzhang Does this mean the bug is for certain in the Diags code or should I dive deeper into the zppy changes in this PR?

It is not a bug in diags.

To summarize this issue, based on what I'm reading:

I will open a PR to update e3sm_diags to use xc.open_mfdataset() for time series files (here).


also to support subset time based on start and end years (i think the later is already supported by E3SM-Project/e3sm_diags@7b53932/e3sm_diags/driver/utils/dataset_xr.py#L1185 ?). We can also add the time selecting and slicing code in zppy to be able to remove cdscan. We should have a discussion with @tomvothecoder on how best to approach this.

The time slicing and subsetting code you listed parses the filepath to get the start and end years. I'm not sure if this currently works with a directory specified as an input path.

tomvothecoder commented 1 day ago

@forsyth2 Can you provide the input paths for the test data? I am trying to reproduce your errors by running prov/e3sm.py but the test_ts is a local reference.

tomvothecoder commented 1 day ago

Note, opening multiple time series files is actually supported already if deriving from other variables.

If we are trying to open up multiple datasets for the same variable (e.g., split up by months or something) without deriving, then we need to add it in #861.

forsyth2 commented 1 day ago

@tomvothecoder

Can you provide the input paths for the test data?

Everything is specified in the cfg I have copied in https://github.com/E3SM-Project/zppy/pull/598#issuecomment-2384025994. Specifically:

input = /lcrc/group/e3sm2/ac.wlin/E3SMv3/v3.LR.historical_0051
output = "/lcrc/group/e3sm/ac.forsyth2/zppy_min_case_e3sm_diags_cdat_migrated_output/test-diags-no-cdat-20240917/v3.LR.historical_0051"

For the test_ts directory specifically, I believe that would be copied from:

/lcrc/group/e3sm/ac.forsyth2/zppy_min_case_e3sm_diags_cdat_migrated_output/test-diags-no-cdat-20240917/v3.LR.historical_0051/post/atm/180x360_aave/ts/monthly/2yr