E3SM-Project / e3sm_to_cmip

Tools to CMORize E3SM output
https://e3sm-to-cmip.readthedocs.io/en/latest/
MIT License
7 stars 7 forks source link

[Bug]: E2C info mode returns ambiguous results (mode 2). #251

Closed TonyB9000 closed 7 months ago

TonyB9000 commented 8 months ago

What happened?

I wrote a test script that takes every CMIP6 dataset_id we have defined in the dataset_spec, isolates the “table.var” portion (129 unique pairs exist), derives the “E2C-required” freq and realm from the “table” name, and issues a (mode 2) info request for each unique pair.

In 93% of cases (120/129), a single unambiguous specification is returned. But the 9 ambiguous cases must be corrected in E2C, which will be my next task. These are the following:

(NOTE: in each case, TABLE_PATH = /p/user_pub/e3sm/staging/resource/cmor/cmip6-cmor-tables/Tables)

TASK = **3hr_pr**.yaml : COMMAND = e3sm_to_cmip --info -v pr --freq **3hr** --realm atm -t $TABLE_PATH --map no_map --info-out yaml/3hr_pr.yaml
- CMIP6 Name: pr
  CMIP6 Table: CMIP6_**day**.json
  CMIP6 Units: kg m-2 s-1
  E3SM Variables: PRECT
- CMIP6 Name: pr
  CMIP6 Table: CMIP6_**Amon**.json
  CMIP6 Units: kg m-2 s-1
  E3SM Variables: PRECC, PRECL

TASK = Amon_pr.yaml : COMMAND = e3sm_to_cmip --info -v pr --freq mon --realm atm -t $TABLE_PATH --map no_map --info-out yaml/Amon_pr.yaml
- CMIP6 Name: pr
  CMIP6 Table: CMIP6_day.json
  CMIP6 Units: kg m-2 s-1
  E3SM Variables: PRECT
- CMIP6 Name: pr
  CMIP6 Table: CMIP6_Amon.json
  CMIP6 Units: kg m-2 s-1
  E3SM Variables: PRECC, PRECL

TASK = Amon_rlut.yaml : COMMAND = e3sm_to_cmip --info -v rlut --freq mon --realm atm -t $TABLE_PATH --map no_map --info-out yaml/Amon_rlut.yaml
- CMIP6 Name: rlut
  CMIP6 Table: CMIP6_day.json
  CMIP6 Units: W m-2
  E3SM Variables: FLUT
- CMIP6 Name: rlut
  CMIP6 Table: CMIP6_Amon.json
  CMIP6 Units: W m-2
  E3SM Variables: FSNTOA, FSNT, FLNT

TASK = Amon_rsutcs.yaml : COMMAND = e3sm_to_cmip --info -v rsutcs --freq mon --realm atm -t $TABLE_PATH --map no_map --info-out yaml/Amon_rsutcs.yaml
- CMIP6 Name: rsutcs
  CMIP6 Table: CMIP6_Amon.json
  CMIP6 Units: W m-2
  E3SM Variables: SOLIN, FSNTOAC    [What’s going on here?]
- CMIP6 Name: rsutcs
  CMIP6 Table: CMIP6_Amon.json
  CMIP6 Units: W m-2
  E3SM Variables: FSUTOAC       [What’s going on here?]

TASK = Amon_rsut.yaml : COMMAND = e3sm_to_cmip --info -v rsut --freq mon --realm atm -t $TABLE_PATH --map no_map --info-out yaml/Amon_rsut.yaml
- CMIP6 Name: rsut
  CMIP6 Table: CMIP6_Amon.json
  CMIP6 Units: W m-2
  E3SM Variables: SOLIN, FSNTOA [What’s going on here?]
- CMIP6 Name: rsut
  CMIP6 Table: CMIP6_Amon.json
  CMIP6 Units: W m-2
  E3SM Variables: FSUTOA        [What’s going on here?]

TASK = day_pr.yaml : COMMAND = e3sm_to_cmip --info -v pr --freq day --realm atm -t $TABLE_PATH --map no_map --info-out yaml/day_pr.yaml
- CMIP6 Name: pr
  CMIP6 Table: CMIP6_day.json
  CMIP6 Units: kg m-2 s-1
  E3SM Variables: PRECT
- CMIP6 Name: pr
  CMIP6 Table: CMIP6_Amon.json
  CMIP6 Units: kg m-2 s-1
  E3SM Variables: PRECC, PRECL

TASK = day_rlut.yaml : COMMAND = e3sm_to_cmip --info -v rlut --freq day --realm atm -t $TABLE_PATH --map no_map --info-out yaml/day_rlut.yaml
- CMIP6 Name: rlut
  CMIP6 Table: CMIP6_day.json
  CMIP6 Units: W m-2
  E3SM Variables: FLUT
- CMIP6 Name: rlut
  CMIP6 Table: CMIP6_Amon.json
  CMIP6 Units: W m-2
  E3SM Variables: FSNTOA, FSNT, FLNT

TASK = day_tasmax.yaml : COMMAND = e3sm_to_cmip --info -v tasmax --freq day --realm atm -t $TABLE_PATH --map no_map --info-out yaml/day_tasmax.yaml
- CMIP6 Name: tasmax
  CMIP6 Table: CMIP6_day.json
  CMIP6 Units: K
  E3SM Variables: TREFHTMX
- CMIP6 Name: tasmax
  CMIP6 Table: CMIP6_Amon.json
  CMIP6 Units: K
  E3SM Variables: TREFMXAV

TASK = day_tasmin.yaml : COMMAND = e3sm_to_cmip --info -v tasmin --freq day --realm atm -t $TABLE_PATH --map no_map --info-out yaml/day_tasmin.yaml
- CMIP6 Name: tasmin
  CMIP6 Table: CMIP6_day.json
  CMIP6 Units: K
  E3SM Variables: TREFHTMN
- CMIP6 Name: tasmin
  CMIP6 Table: CMIP6_Amon.json
  CMIP6 Units: K
  E3SM Variables: TREFMNAV

The first order of business will be to fix e2c so that the ONLY structure returned corresponds to the frequency requested (ONE, not TWO).

Also, according to https://acme-climate.atlassian.net/wiki/spaces/DOC/pages/925304036/Amon+variable+conversion+table,

Forcing the correct outputs in these 9 cases should address issue #

What did you expect to happen? Are there are possible answers you came across?

Single yaml item output for requested frequency. For instance: With COMMAND = e3sm_to_cmip --info -v pr --freq day --realm atm -t $TABLE_PATH --map no_map --info-out yaml/day_pr.yaml

we expect:

- CMIP6 Name: pr
  CMIP6 Table: CMIP6_day.json
  CMIP6 Units: kg m-2 s-1
  E3SM Variables: PRECT

not

- CMIP6 Name: pr
  CMIP6 Table: CMIP6_day.json
  CMIP6 Units: kg m-2 s-1
  E3SM Variables: PRECT
- CMIP6 Name: pr
  CMIP6 Table: CMIP6_Amon.json
  CMIP6 Units: kg m-2 s-1
  E3SM Variables: PRECC, PRECL

Minimal Complete Verifiable Example (MVCE)

These 9 commands exhibit the ambiguity:

 e3sm_to_cmip --info -v pr --freq 3hr --realm atm -t $TABLE_PATH --map no_map --info-out yaml/3hr_pr.yaml
 e3sm_to_cmip --info -v pr --freq mon --realm atm -t $TABLE_PATH --map no_map --info-out yaml/Amon_pr.yaml
 e3sm_to_cmip --info -v rlut --freq mon --realm atm -t $TABLE_PATH --map no_map --info-out yaml/Amon_rlut.yaml
 e3sm_to_cmip --info -v rsutcs --freq mon --realm atm -t $TABLE_PATH --map no_map --info-out yaml/Amon_rsutcs.yaml
 e3sm_to_cmip --info -v rsut --freq mon --realm atm -t $TABLE_PATH --map no_map --info-out yaml/Amon_rsut.yaml
 e3sm_to_cmip --info -v pr --freq day --realm atm -t $TABLE_PATH --map no_map --info-out yaml/day_pr.yaml
 e3sm_to_cmip --info -v rlut --freq day --realm atm -t $TABLE_PATH --map no_map --info-out yaml/day_rlut.yaml
 e3sm_to_cmip --info -v tasmax --freq day --realm atm -t $TABLE_PATH --map no_map --info-out yaml/day_tasmax.yaml
 e3sm_to_cmip --info -v tasmin --freq day --realm atm -t $TABLE_PATH --map no_map --info-out yaml/day_tasmin.yaml

Relevant log output

(immaterial)

Anything else we need to know?

No response

Environment

 active environment : e2c_as_cloned_inst_dsm
active env location : /home/bartoletti1/mambaforge/envs/e2c_as_cloned_inst_dsm
        shell level : 2
   user config file : /home/bartoletti1/.condarc

populated config files : /home/bartoletti1/mambaforge/.condarc conda version : 24.1.2 conda-build version : not installed python version : 3.10.6.final.0 solver : libmamba (default) virtual packages : archspec=1=broadwell conda=24.1.2=0 glibc=2.17=0 linux=3.10.0=0 __unix=0=0 base environment : /home/bartoletti1/mambaforge (writable) conda av data dir : /home/bartoletti1/mambaforge/etc/conda conda av metadata url : None channel URLs : https://conda.anaconda.org/conda-forge/linux-64 https://conda.anaconda.org/conda-forge/noarch package cache : /home/bartoletti1/mambaforge/pkgs /home/bartoletti1/.conda/pkgs envs directories : /home/bartoletti1/mambaforge/envs /home/bartoletti1/.conda/envs platform : linux-64 user-agent : conda/24.1.2 requests/2.31.0 CPython/3.10.6 Linux/3.10.0-1160.108.1.el7.x86_64 rhel/7.9 glibc/2.17 solver/libmamba conda-libmamba-solver/24.1.0 libmambapy/1.5.7 UID:GID : 61843:4061 netrc file : None offline mode : False

TonyB9000 commented 8 months ago

I was able to fix most of the "mon versus day" issues by adding these lines to main.py: _run_info_mode()

                    if self.freq == "mon" and handler['table'] == "CMIP6_day.json":
                        continue
                    if ( self.freq == "day" or self.freq == "3hr" ) and handler['table'] == "CMIP6_Amon.json":
                        continue

However, this fails to fix the issue with rsut" andrsutcs`, as they both are "Amon" table entries:

TASK = Amon_rsutcs.yaml : COMMAND = e3sm_to_cmip --info -v rsutcs --freq mon --realm atm -t $TABLE_PATH --map no_map --info-out yaml/Amon_rsutcs.yaml
- CMIP6 Name: rsutcs
  CMIP6 Table: CMIP6_Amon.json
  CMIP6 Units: W m-2
  E3SM Variables: SOLIN, FSNTOAC    [What’s going on here?]
- CMIP6 Name: rsutcs
  CMIP6 Table: CMIP6_Amon.json
  CMIP6 Units: W m-2
  E3SM Variables: FSUTOAC       [What’s going on here?]

TASK = Amon_rsut.yaml : COMMAND = e3sm_to_cmip --info -v rsut --freq mon --realm atm -t $TABLE_PATH --map no_map --info-out yaml/Amon_rsut.yaml
- CMIP6 Name: rsut
  CMIP6 Table: CMIP6_Amon.json
  CMIP6 Units: W m-2
  E3SM Variables: SOLIN, FSNTOA [What’s going on here?]
- CMIP6 Name: rsut
  CMIP6 Table: CMIP6_Amon.json
  CMIP6 Units: W m-2
  E3SM Variables: FSUTOA        [What’s going on here?]

In addition - two new issues were created, because the ONLY handler for (day) huss is/was the "Amon" handler, which is now eliminated as "Amon" does not match "day", Same issue with (day)tas. It appears that this "overload" was tolerated because both the "day" and "mon" versions for huss and tas use the same formula. But without formal indirection, it is bad form to rely upon "Hey, just one handler seems to match - it must be the right one." Instead, we must create TWO handlers with distinct names, even if their content is identical.

TonyB9000 commented 8 months ago

@tomvothecoder @chengzhuzhang

I have edited the (e2c)/cmor_handlers/handlers.yaml file to fix the remaining issues with handler ambiguity.

I eliminated the entries:

- name: rsut
  units: W m-2
  raw_variables: [SOLIN, FSNTOA]
  table: CMIP6_Amon.json
  unit_conversion: null
  formula: SOLIN - FSNTOA
  positive: up
  levels: null

and

- name: rsutcs
  units: W m-2
  raw_variables: [SOLIN, FSNTOAC]
  table: CMIP6_Amon.json
  unit_conversion: null
  formula: SOLIN - FSNTOAC
  positive: up
  levels: null

As they have no support in https://acme-climate.atlassian.net/wiki/spaces/DOC/pages/925304036/Amon+variable+conversion+table

(I may need to re-run the CMIP6 generation for these two variables, unless I can ascertain which formulas were applied in their production.)

Also, I have added two handler definitions:

- name: huss
  units: "1"
  raw_variables: [QREFHT]
  table: CMIP6_day.json
  unit_conversion: null
  formula: null
  positive: null
  levels: null

and

- name: tas
  units: K
  raw_variables: [TREFHT]
  table: CMIP6_day.json
  unit_conversion: null
  formula: null
  positive: null
  levels: null

These are identical to the "Amon" versions, except they reference table CMIP6_day.json, in sopport of the day-variable generation.

Testing of info mode (2) is complete. There is now one, and only one handler returned for each of the 129 (table.varname) combinations.

I expect to proceed with generation of the v2_1 "day" variables.