E3SM-Project / e3sm_to_cmip

Tools to CMORize E3SM output
https://e3sm-to-cmip.readthedocs.io/en/latest/
MIT License
7 stars 7 forks source link

`--info` option fails on some inputs with `TypeError: expected str, bytes or os.PathLike object, not NoneType` #181

Closed TonyB9000 closed 8 months ago

TonyB9000 commented 1 year ago

Case 1:

When I attempt e3sm_to_cmip –info -v rlut and e3sm_to_cmip –info -v rlut -f mon

I obtain only the information for Amon.rlut:

2023-02-01 17:38:45,854_854:INFO:main:input_path = None
2023-02-01 17:38:45,854_854:INFO:main:output_path = None
2023-02-01 17:38:45,854_854:INFO:main:precheck_path = None
[*] Loaded `rlut` handler for variable `rlut`.
[{'CMIP6 Name': 'rlut',
  'CMIP6 Table': 'CMIP6_Amon.json',
  'CMIP6 Units': 'W m-2',
  'E3SM Variables': 'FSNTOA, FSNT, FLNT'}]

Case 2: Try to obtain day: e3sm_to_cmip –info -v rlut -f day

2023-02-01 17:39:35,680_680:INFO:main:input_path = None
2023-02-01 17:39:35,681_681:INFO:main:output_path = None
2023-02-01 17:39:35,681_681:INFO:main:precheck_path = None
Traceback (most recent call last):
 File "/home/bartoletti1/mambaforge/envs/dsm_dev/bin/e3sm_to_cmip", line 10, in <module>
    sys.exit(main())
  File "/home/bartoletti1/mambaforge/envs/dsm_dev/lib/python3.9/site-packages/e3sm_to_cmip/__main__.py", line 103, in main
    handlers = _load_handlers(
  File "/home/bartoletti1/mambaforge/envs/dsm_dev/lib/python3.9/site-packages/e3sm_to_cmip/util.py", line 441, in _load_handlers
    handler["table"] = _get_table_for_non_monthly_freq(
  File "/home/bartoletti1/mambaforge/envs/dsm_dev/lib/python3.9/site-packages/e3sm_to_cmip/util.py", line 648, in _get_table_for_non_monthly_freq
    table_path = Path(tables_path, table_for_freq)
  File "/home/bartoletti1/mambaforge/envs/dsm_dev/lib/python3.9/pathlib.py", line 1082, in __new__
    self = cls._from_parts(args, init=False)
  File "/home/bartoletti1/mambaforge/envs/dsm_dev/lib/python3.9/pathlib.py", line 707, in _from_parts
    drv, root, parts = self._parse_args(args)
  File "/home/bartoletti1/mambaforge/envs/dsm_dev/lib/python3.9/pathlib.py", line 691, in _parse_args
    a = os.fspath(a)
TypeError: expected str, bytes or os.PathLike object, not NoneType
TonyB9000 commented 1 year ago

I assume my failure to include the "--table-path" above is what causes this error. Still it should be caught.

TonyB9000 commented 1 year ago

At @tomvothecoder 's behest, I include the following background information:

        e3sm_to_cmip –help (regarding –info) returns the following (reformatted for clarity):

Print information about the variables passed in the --var-list argument and exit without doing any processing.

There are three modes for getting the info.

  1. If you just pass the --info flag with the --var-list then it will print out the information for the requested variable.
  2. If the --freq is passed along with the --tables-path, then the CMIP6 tables will get checked to see if the requested variables are present in the CMIP6 table matching the freq. - It was “bullet 2” that was failing – as I wanted to see about generating the “CMIP6 to Native” variable relationships, independent of the given datafiles.
  3. If the --freq is passed with the --tables-path, and the --input-path, and the input-path points to raw unprocessed E3SM files, then an additional check will be made for if the required raw variables are present in the E3SM output.

Also, FWIW, the output e3sm_to_cmip –info -v rlut

2023-02-01 17:38:45,854_854:INFO:main:input_path = None
2023-02-01 17:38:45,854_854:INFO:main:output_path = None
2023-02-01 17:38:45,854_854:INFO:main:precheck_path = None
[*] Loaded `rlut` handler for variable `rlut`.
[{'CMIP6 Name': 'rlut',
  'CMIP6 Table': 'CMIP6_Amon.json',
  'CMIP6 Units': 'W m-2',
  'E3SM Variables': 'FSNTOA, FSNT, FLNT'}]

is a bit noisy. What would be very nice is something like:

Amon.rlut: FSNTOA, FSNT, FLNT
day.rlut: FLUT

Suggestion: Perhaps a flag could be added that indicates the form the informational output should take.


ISSUE: Although the command works when supplied all 4 elements (option 3: -v var, -f freq, -t table-path, -i input-path), the result is inconsistent with respect to what is produced to stdout, and what is written to the "info-out" output file.

Cases

tomvothecoder commented 1 year ago

Todo from meeting 3/6/23: @TonyB9000 includes notes about MPAS handler validation with variables in datasets, unnecessary resource consumption because e2c is not failing early, etc.

tomvothecoder commented 1 year ago

Hey @TonyB9000, is this issue the same one you have been mentioning in our meetings?

If so, does this only affect the "master" branch (and v1.10.rc1 or is this issue stemming from v1.9.1?

TonyB9000 commented 1 year ago

Hi @tomvothecoder , I'm probably sloppy on this. I recently re-cloned the e3sm_to_cmip repo, hoping thereby to ensure I am working from the latest material. I will have to create an e2c_test environment, pip install, and run some tests with various levels of "--info" on various datasets (atmos/land/river)x(v1, v2), etc. I'm going to start today, and provide better-documented results.

TonyB9000 commented 1 year ago

@tomvothecoder @xylar I am moving commentary on "--info" mode issues to this thread, as the conda environment thread is not appropriate (and appears largely resolved - or resolvable).

TonyB9000 commented 1 year ago

@xylar @tomvothecoder I've fixed the issue with --info mode 3 not writing to the user-supplied "--info-out" directory. The revamped "_run_info_mode" (replaced print_var_info) used:

        if **self.output_path** is not None:
            with open(self.output_path, "w") as outstream:
                yaml.dump(messages, outstream)
        else:
            pprint(messages)

it will now use

        if **self.info_out_path** is not None:
            with open(self.info_out_path, "w") as outstream:
                yaml.dump(messages, outstream)
        elif **self.output_path** is not None:
            with open(self.output_path, "w") as outstream:
                yaml.dump(messages, outstream)
        else:
            pprint(messages)
TonyB9000 commented 1 year ago

@xylar @tomvothecoder (ADVISE): The variable handler loading needs revamp. They appear to be loaded differently (sometimes by variable, sometimes by tables) and when a handler is not located for a variable, a "KeyError" is raised and a stack-trace dumped. I think this is overkill. This is not a code flaw. A simpler "error + exit" should suffice. In particular, the same "fault" is generated whether one is "cmorizing", or merely enquiring about variables and handlers (--info mode).

With "--info", the enumeration of handers is "by variable" (via "load_all_handlers"), and (for MPAS variables) does not appear to call the "_get_mpas_handlers" function (which calls handlers = _get_handlers_from_modules(MPAS_HANDLER_DIR_PATH) successfully).

TonyB9000 commented 1 year ago

With --info (mode 3) I have replaced the former behavior:

If the data SUPPORTS the variable, then
        info-out obtains a (list of) "json" structures with variable info
        stdout obtains a "yaml" output similar to the json content
        stderr is silent
If the data DOES NOT SUPPORT the variable, then
        stdout obtains an empty pair of braces ( [] )
        stderr explains "variable not found in input data" (should read "supporting variables not found in input data"

with

The output_dir (info_out, or default output_dir) ALWAYS obtains the variable handler info (yaml output), and
If the data SUPPORTS the variable, then
        stdout obtains "Variable=<varname>:DataSupport=TRUE"
If the data DOES NOT SUPPORT the variable, then
        stdout obtains "Variable=<varname>:DataSupport=FALSE"

However, despite having supplied "--freq" (just ONE of mon, day, or 3hr) for a given variable (e.g. "pr"), stdout receives:

Variable=pr:DataSupport=TRUE
Variable=pr:DataSupport=TRUE
Variable=pr:DataSupport=FALSE

and the yaml output (only for the TRUE cases) looks like:

- CMIP6 Name: pr
  CMIP6 Table: CMIP6_day.json
  CMIP6 Units: kg m-2 s-1
  E3SM Variables: PRECT
- CMIP6 Name: pr
  CMIP6 Table: CMIP6_Amon.json
  CMIP6 Units: kg m-2 s-1
  E3SM Variables: PRECC, PRECL

(This yaml is poorly structured. The top-level (CMIP6 Name) is repeated, but one must dig down to "Table" to know which instane one is examininhg and the supporting variables. Better to have "TabVar" be the key (Amon.pr, day.pr, 3hr.pr) Hassle to process otherwise.)

TonyB9000 commented 1 year ago

@tomvothecoder @xylar Confirmation of Behavior. With --info mode 2, any supplied MPAS variable (Omon.zhalfo. SImon.simass) will generate a fault (stderr) of the form:

2023-07-20 22:16:25,753_753:INFO:__init__:--------------------------------------
2023-07-20 22:16:25,753_753:INFO:__init__:| E3SM to CMIP Configuration
2023-07-20 22:16:25,753_753:INFO:__init__:--------------------------------------
2023-07-20 22:16:25,753_753:INFO:__init__:    * var_list='['zhalfo']'
2023-07-20 22:16:25,753_753:INFO:__init__:    * input_path='None'
2023-07-20 22:16:25,754_754:INFO:__init__:    * output_path='None'
2023-07-20 22:16:25,754_754:INFO:__init__:    * precheck_path='None'
2023-07-20 22:16:25,754_754:INFO:__init__:    * freq='mon'
2023-07-20 22:16:25,754_754:INFO:__init__:    * realm='atm'
2023-07-20 22:16:25,754_754:INFO:__init__:    * Writing log output file to: logs/20230720_221625_749556
Traceback (most recent call last):
  File "/home/bartoletti1/mambaforge/envs/e2c_loose_e2c1100rc1/bin/e3sm_to_cmip", line 8, in <module>
    sys.exit(main())
  File "/home/bartoletti1/mambaforge/envs/e2c_loose_e2c1100rc1/lib/python3.10/site-packages/e3sm_to_cmip/__main__.py", line 793, in main
    app = E3SMtoCMIP(args)
  File "/home/bartoletti1/mambaforge/envs/e2c_loose_e2c1100rc1/lib/python3.10/site-packages/e3sm_to_cmip/__main__.py", line 152, in __init__
    self.handlers = self._get_handlers()
  File "/home/bartoletti1/mambaforge/envs/e2c_loose_e2c1100rc1/lib/python3.10/site-packages/e3sm_to_cmip/__main__.py", line 214, in _get_handlers
    handlers = load_all_handlers(self.realm, self.var_list)
  File "/home/bartoletti1/mambaforge/envs/e2c_loose_e2c1100rc1/lib/python3.10/site-packages/e3sm_to_cmip/cmor_handlers/utils.py", line 77, in load_all_handlers
    raise KeyError(
KeyError: "No handlers are defined for the variables: ['zhalfo']. Make sure at least one variable handler is defined for each of these variables in `/home/bartoletti1/mambaforge/envs/e2c_loose_e2c1100rc1/lib/python3.10/site-packages/e3sm_to_cmip/cmor_handlers/handlers.yaml`."

Note that the Configuration section reports * realm='atm' and there is nothing in the help for --info to indicate that realm-atm is assumed.

I thought I could get around it by supplying "--realm mpaso" on the command line. But this triggers another fault as (once again) processing occurs as if "--info" is not present: The fault:

  File "/home/bartoletti1/mambaforge/envs/e2c_loose_e2c1100rc1/lib/python3.10/site-packages/e3sm_to_cmip/__main__.py", line 557, in _validate_parsed_args
    raise ValueError("MPAS ocean handling requires a map file")

But - Lo and Behold - if you supply a mapfile (I used a valid one for the case - will see if a dummy works) I get the proper (mode 2) output for var=zhalfo.

- CMIP6 Name: zhalfo
  CMIP6 Table: CMIP6_Omon.json
  CMIP6 Units: m
  E3SM Variables: MPASO, MPAS_mesh, MPAS_map

At minimum, we must update the --help text to make --realm and (for MPAS) --map required.

(Edit: works if you supply --map no_map) (sigh)

I still must test this behavior with mode 3.

tomvothecoder commented 1 year ago

Thank you for the work @TonyB9000. Can you open up a pull request related to this GitHub issue? Also I suggest summarizing issues and changes in the pull request first, then tagging reviewers (e.g., me) once your pull request is ready for review. This really helps streamline collaboration because everyone's bandwidth is tight. Thanks!

Also you don't need to tag Xylar on development activities here. He's mainly just interested in whether we are releasing v1.10.0rc2 or not. @xylar we won't be releasing e3sm_to_cmip v1.10.0rc2. This issue will probably be addressed in the next release (e.g,. v1.11.0).

xylar commented 1 year ago

Thanks @tomvothecoder. That's helpful to know. I won't be expecting anymore updates before the v1.10.0 release, then.

@TonyB9000, I'd prefer not to be tagged on development activities. If you have specific questions I can answer, feel free to @ me but I don't really have the expertise in e2c or the time to provide more general advice or to become more engaged with its development.

TonyB9000 commented 1 year ago

@xylar Understood. Some of the changes I am making affect the "--info" functionality, and I don't know to what degree you rely upon that, but we can always back out detrimental changes.

xylar commented 1 year ago

@TonyB9000, I don't use e2c at all myself. I just helped develop the handlers for MPAS components.

TonyB9000 commented 1 year ago

@xylar OK, then you are the one responsible! :)

The atmos/land handlers list the actual E3SM variables supporting a given CMIP var (e.g.):

::::::::::::::
Info_Out-e2c-1.10.0rc1-Mode-1/result-Amon.hfls
::::::::::::::
- CMIP6 Name: hfls
  CMIP6 Table: CMIP6_Amon.json
  CMIP6 Units: W m-2
  E3SM Variables: LHFLX
::::::::::::::
Info_Out-e2c-1.10.0rc1-Mode-1/result-Amon.rtmt
::::::::::::::
- CMIP6 Name: rtmt
  CMIP6 Table: CMIP6_Amon.json
  CMIP6 Units: W m-2
  E3SM Variables: FSNT, FLNT

Whereas the MPAS handlers are not so informative:

Info_Out-e2c-1.10.0rc1-Mode-1/result-Omon.masso
::::::::::::::
- CMIP6 Name: masso
  CMIP6 Table: CMIP6_Omon.json
  CMIP6 Units: kg
  E3SM Variables: MPASO, MPASO_namelist, MPAS_mesh
::::::::::::::
Info_Out-e2c-1.10.0rc1-Mode-1/result-Omon.tauvo
::::::::::::::
- CMIP6 Name: tauvo
  CMIP6 Table: CMIP6_Omon.json
  CMIP6 Units: N m-2
  E3SM Variables: MPASO, MPAS_map

(and why would masso include "MPASO_namelist", and tauvo not?)

In any case, the reason that --info mode 3 (does the native data support this CMIP variable) fails for MPAS data is that the xarray dataset:

with xr.open_dataset(file_path) as ds:

fails to find variables named "MPASO" and "MPAS_mesh" in ds.data_vars.

Is there a way to remedy this? (Is there a need? Do some MPAS runs include variables that other might not?)

I think you are the expert here.

chengzhuzhang commented 1 year ago

Hey @TonyB9000, I'm reading this issue discussion and I would suggest to first address the issue as described for non-MPAS handlers. As you already observed, MPAS handlers are different animals. There is not a unified .yaml file for MPAS variables. We can plan to refactor to follow what has been done with other components as resources available. But at this stage the --info for MPAS variables will behave differently.

TonyB9000 commented 1 year ago

@chengzhuzhang Yes, I understand. As it stands, there is very little value in "--info" for MPAS variables. I'll continue to output the handers, but mode 3 reports "variable (support) not found in native data" for every MPAS variable, which is simply incorrect. These ideosyncracies must be itemized in the help text (and will be).

In particular, --info (mode 3) does not work for MPAS, and to get modes 1,2 to work, it seems you must include "--realm mpaso" (or mpassi), and include "--map anything_at_all". I'll try to word it more professionally. In truth, the check for "--map" should be entirely elided for "--info" mode, not sure I want to deal with that on this pass.

TonyB9000 commented 1 year ago

@tomvothecoder You wrote "a pull request related to this GitHub issue". When I do a cli "git push", git tells me:

remote: Create a pull request for 'fix_e3c_info_mode' on GitHub by visiting:
remote:      https://github.com/E3SM-Project/e3sm_to_cmip/pull/new/fix_e3c_info_mode

I'm unclear how to navigate "#issue" stuff, or associate a PR with one (or more) issues.

tomvothecoder commented 1 year ago

Hey @TonyB9000, I'm reading this issue discussion and I would suggest to first address the issue as described for non-MPAS handlers. As you already observed, MPAS handlers are different animals. There is not a unified .yaml file for MPAS variables. We can plan to refactor to follow what has been done with other components as resources available. But at this stage the --info for MPAS variables will behave differently.

There is a separate GitHub issue for making info mode work with MPAS handlers here: https://github.com/E3SM-Project/e3sm_to_cmip/issues/191. I just opened a placeholder issue to refactor how MPAS handlers are implemented in general here: https://github.com/E3SM-Project/e3sm_to_cmip/issues/200

@tomvothecoder You wrote "a pull request related to this GitHub issue". When I do a cli "git push", git tells me:

remote: Create a pull request for 'fix_e3c_info_mode' on GitHub by visiting:
remote:      https://github.com/E3SM-Project/e3sm_to_cmip/pull/new/fix_e3c_info_mode

I'm unclear how to navigate "#issue" stuff, or associate a PR with one (or more) issues.

  1. Open a dev branch intended to address this GitHub issue
  2. Open a pull request on GitHub with that dev branch
  3. Link GitHub issue to pull request either by:
    • Using "Development" UI option on the right-side bar options in the pull request
    • Typing "- Closes #<GITHUB ISSUE #>" in the description of the pull request
tomvothecoder commented 1 year ago

Also Google: https://docs.github.com/en/issues/tracking-your-work-with-issues/linking-a-pull-request-to-an-issue

xylar commented 1 year ago

@TonyB9000 and @chengzhuzhang, I agree with @chengzhuzhang's comment (https://github.com/E3SM-Project/e3sm_to_cmip/issues/181#issuecomment-1646180378) that the MPAS handlers are a special case. They were probably either written before the --info option was available or were not written with it in mind. We can fix them later but if this isn't interfering with basic functionality of e2c, I would suggest putting that on the back burner. I might be able to help for the next E3SM-Unified release but it's unlikely I can help anytime in the next few months.

TonyB9000 commented 1 year ago

@xylar Thanks Xylar. What I've done for the present in this regard is (1) prevent the user from doing a mode-3 --info on MPAS files (rather then press ahead and crash), and (2) update the help text to explain this up front.

TonyB9000 commented 1 year ago

@tomvothecoder On linking PRs to Issues: Since I created a last-minute branch (fix_e2c_info_mode) for the PR, is it too late to use the "- Closes #<GITHUB ISSUE #>" method to link this to existing issues? (Is it too late to edit the "description of the pull request"?) I'll try this with #191 (although it only partially extends info-mode (1,2) to work with MPAS variables).