E3SM-Project / e3sm_diags

E3SM Diagnostics package
https://e3sm-project.github.io/e3sm_diags
BSD 3-Clause "New" or "Revised" License
42 stars 32 forks source link

EAMxx variables #880

Open chengzhuzhang opened 3 weeks ago

chengzhuzhang commented 3 weeks ago

Description

Reference: The mapping of new variables is based on https://acme-climate.atlassian.net/wiki/spaces/EAMXX/pages/4535976058/Output+Standard+Names put together by @AaronDonahue The decadal output outlined by @brhillman: https://github.com/E3SM-Project/eamxx-scripts/pull/180/files#diff-1646ba1e37781387625d2ce585aad9ef7f5b6407616300838c7aecd44c67df7e

Checklist

If applicable:

chengzhuzhang commented 2 weeks ago

@tomvothecoder I completed 2D and 3D variables derivation, excepted for COSP related output. The only thing left here is that we need to support lowercase landfrac/ocnfrac that from EAMxx. The code block is as follows:

https://github.com/E3SM-Project/e3sm_diags/blob/43d2df772bfe08d9e17e218e2c9574b7e276b782/e3sm_diags/driver/__init__.py#L7-L12

I'm now sure how to provide a clean way to accommodate lower case variable names..

tomvothecoder commented 2 weeks ago

The only thing left here is that we need to support lowercase landfrac/ocnfrac that from EAMxx.

I just pushed https://github.com/E3SM-Project/e3sm_diags/pull/880/commits/46a5dad5552255024f83cc86b3623a133a4ef801 to add support for more land/ocean var keys. Let me know your thoughts.

tomvothecoder commented 2 weeks ago

Pushed ef261b8 (#880) to make land sea mask methods a bit cleaner. Should be good to go.

tomvothecoder commented 1 week ago

I'm getting a separate error about circular imports. Do you see this too?

Not sure how this was introduced, but it should be addressed:

2024-11-08 12:27:54,820 [ERROR]: core_parameter.py(_run_diag:343) >> Error in e3sm_diags.driver.lat_lon_driver
Traceback (most recent call last):
  File "/global/u2/v/vo13/E3SM-Project/e3sm_diags/e3sm_diags/parameter/core_parameter.py", line 340, in _run_diag
    single_result = module.run_diag(self)
AttributeError: partially initialized module 'e3sm_diags.driver.lat_lon_driver' has no attribute 'run_diag' (most likely due to a circular import)
2024-11-08 12:27:54,821 [ERROR]: run.py(run_diags:91) >> Error traceback:
Traceback (most recent call last):
  File "/global/u2/v/vo13/E3SM-Project/e3sm_diags/e3sm_diags/run.py", line 89, in run_diags
    params_results = main(params)
  File "/global/u2/v/vo13/E3SM-Project/e3sm_diags/e3sm_diags/e3sm_diags_driver.py", line 373, in main
    parameters_results = _run_serially(parameters)
  File "/global/u2/v/vo13/E3SM-Project/e3sm_diags/e3sm_diags/e3sm_diags_driver.py", line 271, in _run_serially
    nested_results.append(parameter._run_diag())
  File "/global/u2/v/vo13/E3SM-Project/e3sm_diags/e3sm_diags/parameter/core_parameter.py", line 333, in _run_diag
    module = importlib.import_module(mod_str)
  File "/global/u2/v/vo13/mambaforge/envs/e3sm_diags_dev_892/lib/python3.10/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 883, in exec_module
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "/global/u2/v/vo13/E3SM-Project/e3sm_diags/e3sm_diags/driver/zonal_mean_xy_driver.py", line 17, in <module>
    from e3sm_diags.metrics.metrics import spatial_avg
ImportError: cannot import name 'spatial_avg' from partially initialized module 'e3sm_diags.metrics.metrics' (most likely due to a circular import) (/global/u2/v/vo13/E3SM-Project/e3sm_diags/e3sm_diags/metrics/metrics.py)
2024-11-08 12:27:54,824 [INFO]: logger.py(move_log_to_prov_dir:106) >> Log file saved in /global/cfs/cdirs/e3sm/www/cdat-migration-fy24/877-attr-err/eamxx_decadal_1996_1107_edv3/prov/e3sm_diags_run.log
2024-11-08 12:27:55,985 [INFO]: lat_lon_driver.py(run_diag:69) >> Variable: U
Value(False)
2024-11-08 12:32:27,811 [INFO]: lat_lon_driver.py(_run_diags_3d:396) >> Selected pressure level(s): [850.0]
2024-11-08 12:32:29,678 [INFO]: regrid.py(subset_and_align_datasets:70) >> Selected region: global
2024-11-08 12:32:39,801 [INFO]: io.py(_save_data_metrics_and_plots:77) >> Metrics saved in /global/cfs/cdirs/e3sm/www/cdat-migration-fy24/877-attr-err/eamxx_decadal_1996_1107_edv3/lat_lon/ERA5/ERA5-U-850-ANN-global.json
2024-11-08 12:32:54,463 [INFO]: utils.py(_save_plot:91) >> Plot saved in: /global/cfs/cdirs/e3sm/www/cdat-migration-fy24/877-attr-err/eamxx_decadal_1996_1107_edv3/lat_lon/ERA5/ERA5-U-850-ANN-global.png
2024-11-08 12:32:54,463 [INFO]: utils.py(_save_plot:91) >> Plot saved in: /global/cfs/cdirs/e3sm/www/cdat-migration-fy24/877-attr-err/eamxx_decadal_1996_1107_edv3/lat_lon/ERA5/ERA5-U-850-ANN-global.png
tomvothecoder commented 1 week ago

I'm getting a separate error about circular imports. Do you see this too?

Not sure how this was introduced, but it should be addressed:

2024-11-08 12:27:54,820 [ERROR]: core_parameter.py(_run_diag:343) >> Error in e3sm_diags.driver.lat_lon_driver
Traceback (most recent call last):
  File "/global/u2/v/vo13/E3SM-Project/e3sm_diags/e3sm_diags/parameter/core_parameter.py", line 340, in _run_diag
    single_result = module.run_diag(self)
AttributeError: partially initialized module 'e3sm_diags.driver.lat_lon_driver' has no attribute 'run_diag' (most likely due to a circular import)
2024-11-08 12:27:54,821 [ERROR]: run.py(run_diags:91) >> Error traceback:
Traceback (most recent call last):
  File "/global/u2/v/vo13/E3SM-Project/e3sm_diags/e3sm_diags/run.py", line 89, in run_diags
    params_results = main(params)
  File "/global/u2/v/vo13/E3SM-Project/e3sm_diags/e3sm_diags/e3sm_diags_driver.py", line 373, in main
    parameters_results = _run_serially(parameters)
  File "/global/u2/v/vo13/E3SM-Project/e3sm_diags/e3sm_diags/e3sm_diags_driver.py", line 271, in _run_serially
    nested_results.append(parameter._run_diag())
  File "/global/u2/v/vo13/E3SM-Project/e3sm_diags/e3sm_diags/parameter/core_parameter.py", line 333, in _run_diag
    module = importlib.import_module(mod_str)
  File "/global/u2/v/vo13/mambaforge/envs/e3sm_diags_dev_892/lib/python3.10/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 883, in exec_module
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "/global/u2/v/vo13/E3SM-Project/e3sm_diags/e3sm_diags/driver/zonal_mean_xy_driver.py", line 17, in <module>
    from e3sm_diags.metrics.metrics import spatial_avg
ImportError: cannot import name 'spatial_avg' from partially initialized module 'e3sm_diags.metrics.metrics' (most likely due to a circular import) (/global/u2/v/vo13/E3SM-Project/e3sm_diags/e3sm_diags/metrics/metrics.py)
2024-11-08 12:27:54,824 [INFO]: logger.py(move_log_to_prov_dir:106) >> Log file saved in /global/cfs/cdirs/e3sm/www/cdat-migration-fy24/877-attr-err/eamxx_decadal_1996_1107_edv3/prov/e3sm_diags_run.log
2024-11-08 12:27:55,985 [INFO]: lat_lon_driver.py(run_diag:69) >> Variable: U
Value(False)
2024-11-08 12:32:27,811 [INFO]: lat_lon_driver.py(_run_diags_3d:396) >> Selected pressure level(s): [850.0]
2024-11-08 12:32:29,678 [INFO]: regrid.py(subset_and_align_datasets:70) >> Selected region: global
2024-11-08 12:32:39,801 [INFO]: io.py(_save_data_metrics_and_plots:77) >> Metrics saved in /global/cfs/cdirs/e3sm/www/cdat-migration-fy24/877-attr-err/eamxx_decadal_1996_1107_edv3/lat_lon/ERA5/ERA5-U-850-ANN-global.json
2024-11-08 12:32:54,463 [INFO]: utils.py(_save_plot:91) >> Plot saved in: /global/cfs/cdirs/e3sm/www/cdat-migration-fy24/877-attr-err/eamxx_decadal_1996_1107_edv3/lat_lon/ERA5/ERA5-U-850-ANN-global.png
2024-11-08 12:32:54,463 [INFO]: utils.py(_save_plot:91) >> Plot saved in: /global/cfs/cdirs/e3sm/www/cdat-migration-fy24/877-attr-err/eamxx_decadal_1996_1107_edv3/lat_lon/ERA5/ERA5-U-850-ANN-global.png

No longer appearing after running make install again. Good to go here.

chengzhuzhang commented 1 week ago

Commit https://github.com/E3SM-Project/e3sm_diags/issues/892 worked well! It took about 1 min to finish the 3d variableU run, which is comparable to what cdat does. Thank you for a quick fix! @tomvothecoder

chengzhuzhang commented 1 week ago

I found another issue that derived variable is not working for time-series files. Example .cfg

[#]
sets = ["lat_lon"]
case_id = "ERA5"
variables = ["QREFHT"]
ref_name = "ERA5_ext"
reference_name = "ERA5 Reanalysis"
seasons = ["ANN", "01", "02", "03", "04", "05", "06", "07", "08", "09", "10", "11", "12", "DJF", "MAM", "JJA", "SON"]
contour_levels = [0.2, 0.5, 1, 2.5, 5, 7.5, 10, 12.5, 15, 17.5]
diff_levels = [-5, -4, -3, -2, -1, -0.25, 0.25, 1, 2, 3, 4, 5]

The input files d2m and sp files (which are ERA5 variables used to derive QREFHT ) are available in time-series/EAM5_ext directory. But the program is trying to look for: /global/cfs/cdirs/e3sm/diagnostics/observations/Atm/time-series/ERA5_ext/QREFHT_.{13}.nc

tomvothecoder commented 1 week ago

I found another issue that derived variable is not working for time-series files. Example .cfg

[#]
sets = ["lat_lon"]
case_id = "ERA5"
variables = ["QREFHT"]
ref_name = "ERA5_ext"
reference_name = "ERA5 Reanalysis"
seasons = ["ANN", "01", "02", "03", "04", "05", "06", "07", "08", "09", "10", "11", "12", "DJF", "MAM", "JJA", "SON"]
contour_levels = [0.2, 0.5, 1, 2.5, 5, 7.5, 10, 12.5, 15, 17.5]
diff_levels = [-5, -4, -3, -2, -1, -0.25, 0.25, 1, 2, 3, 4, 5]

The input files d2m and sp files (which are ERA5 variables used to derive QREFHT ) are available in time-series/EAM5_ext directory. But the program is trying to look for: /global/cfs/cdirs/e3sm/diagnostics/observations/Atm/time-series/ERA5_ext/QREFHT_.{13}.nc

You may need to step-through the loop that attempts to derive QREFHT until it hits d2m and sp. I walked through the code and it tries to match on these two filepath patterns, which look correct.

Possible issue

The ERA5_ext sub-directory doesn't look like it exists under the root directory (/global/cfs/cdirs/e3sm/diagnostics/observations/Atm/time-series).

(e3sm_diags_dev_892) vo13@login08:.../Atm/time-series$ pwd
/global/cfs/cdirs/e3sm/diagnostics/observations/Atm/time-series
(e3sm_diags_dev_892) vo13@login08:.../Atm/time-series$ ls ERA5_ext
ls: cannot access 'ERA5_ext': No such file or directory
(e3sm_diags_dev_892) vo13@login08:.../Atm/time-series$ 
chengzhuzhang commented 1 week ago

@tomvothecoder thank you for looking into this! I'm actually stepping into the code, and it does look like the logic should be correct. Yes, the files are mis-placed to EAR5, but should be in a seperate directory ERA5-ext. Let me fix the data and try again.

chengzhuzhang commented 1 week ago

I can confirm that this is a data problem. The problem described in https://github.com/E3SM-Project/e3sm_diags/pull/880#issuecomment-2465876942 is resolved with time-series files placed in correct directory ERA5-ext. The data fix is ready on lcrc and perlmutter.

tomvothecoder commented 1 week ago

I think this branch needs to be rebased on the latest cdat-migration-fy24. Let me do this and then we can merge whenever you're ready.

chengzhuzhang commented 1 week ago

I think this branch needs to be rebased on the latest cdat-migration-fy24. Let me do this and then we can merge whenever you're ready.

Thank you for the review and rebasing @tomvothecoder. I will tag EAMxx developers for a review before merging.

tomvothecoder commented 1 week ago

Just rebased, should be good to go for further review.

chengzhuzhang commented 1 week ago

Hi @PeterCaldwell @brhillman @crterai @AaronDonahue:

This PR added support for all EAMxx output variables (except for those COSP related). This takes a little longer because, this update is based on the brand new e3sm_diags that is just migrated over to use xarray/xcdat to replace cdat (kudos to @tomvothecoder).

Here is an example e3sm_diags run based on the 1996ish EAMxx decadal run that Ben provided.

The workflow to generate this run is to first run ncclimo to generate the regridded climatology file ; and then run the e3sm_diags run script.

Example of the nco script as below. Thanks to @czender, the two nco steps listed below can be simplified with just using one ncclimo command line with latest nco release. The improvement will be available through the next e3sm-unified release scheduled in February.

#!/bin/bash                                 

source /global/common/software/e3sm/anaconda_envs/load_latest_e3sm_unified_pm-cpu.sh

drc_in=/global/cfs/cdirs/e3sm/chengzhu/eamxx/run
drc_out=/global/cfs/cdirs/e3sm/chengzhu/eamxx/post/data
caseid=output.scream.decadal.monthlyAVG_ne30pg2.AVERAGE.nmonths_x1

# spoofed climatology files with data from 1995-09 to 1996-08

# create climatology files
cd ${drc_in};ls ${caseid}*1996-0[1-8]*.nc ${caseid}*1995-09*.nc ${caseid}*1995-1[0-2]*.nc | ncclimo -P eamxx --fml_nm=eamxx_decadal --yr_srt=1996 --yr_end=1996 --drc_out=$drc_out

map=/global/cfs/projectdirs/e3sm/zender/maps/map_ne30pg2_to_cmip6_180x360_traave.20231201.nc
# remaping climo files to regular lat-lon
cd $drc_out;ls *.nc | ncremap -P eamxx --prm_opt=time,lwband,swband,ilev,lev,plev,cosp_tau,cosp_cth,cosp_prs,dim2,ncol --map=${map} --drc_out=${drc_out}/rgr

exit

It would be great if you could review the code change to verify the variable derivations are correct. Next we will work on better support on arbitrary length runs; COSP histogram and other variability sets (if given longer simulation). Any feedback on capabilities and priorities are welcome!

Thanks, Jill