Workflow for ZPPY- PCMDI for E3SM diagnostics (with fixes on issues found in #624)

zhangshixuan1987 commented 3 weeks ago

Issue resolution

Closes #

Select one: This pull request is...

[ ] a bug fix: increment the patch version
[ ] a small improvement: increment the minor version
[ ] an incompatible (non-backwards compatible) API change: increment the major version

1. Does this do what we want it to do?

Objectives:

Objective 1
Objective 2
...
Objective n

Required:

[ ] Product Management: I have confirmed with the stakeholders that the objectives above are correct and complete.
[ ] Testing: I have added or modified at least one "min-case" configuration file to test this change. Every objective above is represented in at least one cfg.
[ ] Testing: I have considered likely and/or severe edge cases and have included them in testing.

If applicable:

[ ] Testing: this pull request introduces an important feature or bug fix that we must test often. I have updated the weekly-test configuration files, not just a "min-case" one.
[ ] Testing: this pull request adds at least one new possible parameter to the cfg. I have tested using this parameter with and without any other parameter that may interact with it.

2. Are the implementation details accurate & efficient?

Required:

[ ] Logic: I have visually inspected the entire pull request myself.
[ ] Logic: I have left GitHub comments highlighting important pieces of code logic. I have had these code blocks reviewed by at least one other team member.

If applicable:

[ ] Dependencies: This pull request introduces a new dependency. I have discussed this requirement with at least one other team member. The dependency is noted in zppy/conda, not just an import statement.

3. Is this well documented?

Required:

[ ] Documentation: by looking at the docs, a new user could easily understand the functionality introduced by this pull request.

4. Is this code clean?

Required:

[ ] Readability: The code is as simple as possible and well-commented, such that a new team member could understand what's happening.
[ ] Pre-commit checks: All the pre-commits checks have passed.

If applicable:

[ ] Software architecture: I have discussed relevant trade-offs in design decisions with at least one other team member. It is unlikely that this pull request will increase tech debt.

zhangshixuan1987 commented 3 weeks ago

@chengzhuzhang @forsyth2 : Hi Jill and Ryan, I closed the https://github.com/E3SM-Project/zppy/pull/624 as some of the fixes are made on the issues pointed our by you. I open this new pull request for the continuing discussion.

chengzhuzhang commented 3 weeks ago

@zhangshixuan1987

From the original PR#624, I ran into an env problem that suggests I can't access your build of pcmdi_metrics_master conda environment. All other tasks like ts ran.

EnvironmentNameNotFound: Could not find conda environment: pcmdi_metrics_master

However In the new PR I ran into an error much ealier, right after issuing the zppy command line, as below:

(zppy_dev_24) [ac.zhang40@chrlogin2 test_zppy]$ zppy -c post.v3.LR.historical_pmp.cfg 
Traceback (most recent call last):
  File "/home/ac.zhang40/y/envs/zppy_dev_24/bin/zppy", line 5, in <module>
    from zppy.__main__ import main
  File "/home/ac.zhang40/y/envs/zppy_dev_24/lib/python3.9/site-packages/zppy/__main__.py", line 18, in <module>
    from zppy.pcmdi_diags import pcmdi_diags
  File "/home/ac.zhang40/y/envs/zppy_dev_24/lib/python3.9/site-packages/zppy/pcmdi_diags.py", line 8, in <module>
    from zppy.utils import (
ImportError: cannot import name 'checkStatus' from 'zppy.utils' (/home/ac.zhang40/y/envs/zppy_dev_24/lib/python3.9/site-packages/zppy/utils.py)

I think the error is caused by the conflicting zppy/__main__.py

In any case, I think i now have a better understand of your implementation, after testing and reviewing the results. And perhaps we can have a chat sometime next week about design details.

zhangshixuan1987 commented 3 weeks ago

@chengzhuzhang: Hi Jill, when I submitted this new PR, I did a code rebase to make sure that my code is consistent with the zppy master branch. However, I just found that a certain amount of the changes have been made in zppy-master branch. Due to these changes, I need to make modifications to my pcmdi workflow. Overall, please hold, and I may need to resubmit a PR again. I apologize for the inconvenience.

chengzhuzhang commented 3 weeks ago

@zhangshixuan1987 Oh, no worries! I didn't realize it is still a draft. Thanks for the heads-up. I'm happy to test again when it is ready. No hurry.

forsyth2 commented 3 weeks ago

we can have a chat sometime next week about design details.

Due to these changes, I need to make modifications to my pcmdi workflow. Overall, please hold, and I may need to resubmit a PR again. I apologize for the inconvenience.

I've unfortunately been busy with other priorities, but I've been trying to follow this work at a high level. I would like to again raise my concern about repeating design decision mistakes that have caused issues in the global_time_series task. It may be a good idea to discuss design further before proceeding with more work.

However, I just found that a certain amount of the changes have been made in zppy-master branch.

Yes, I mentioned in https://github.com/E3SM-Project/zppy/pull/624#discussion_r1800306399 that #628 made some changes to clean up the zppy code base. That's probably the biggest change.

forsyth2 commented 3 weeks ago

I'm hesitant to make too many comments without fully reviewing and running the latest code myself, but my specific design decision concerns at the moment are the following. Perhaps I just need more context though, which we can discuss in a meeting.

I agree strongly that it would be much easier to navigate the results with a viewer page!

It appears we are once again adding a de facto analysis package inside zppy. #616 is really turning the global_time_series task into that -- a piece of software with scope and environment distinct from zppy that is however fundamentally integrated into zppy. This has caused two major issues:

It is hard to test global_time_series. #616 is adding unit tests for the global_time_series. I have to run those tests from a different environment than the zppy dev environment because that code runs in the environment that the global_time_series task is running in, not the environment zppy is running in. This convolutes an already complicated testing scheme -- e.g., see #634 for the process required to do weekly integration testing.
The environment issue segues into the second major issue: the global_time_series code has different dependencies than the zppy code. This complicated @xylar's question of what code exactly had numpy dependencies (global_time_series needs numpy, but zppy does not). Furthermore, this means there isn't really a dev environment for global_time_series. And I suspect a similar issue here is responsible for the EnvironmentNameNotFound: Could not find conda environment: pcmdi_metrics_master error mentioned in https://github.com/E3SM-Project/zppy/pull/640#issuecomment-2452618239.

global_time_series developed these issues over time as it grew from a small task into a complex pseudo-package. We need to discuss the design decisions for implementing the PMP task, before repeating known issues. zppy is already a complex piece of code requiring python > bash interaction. These built-in analysis packages add a third layer of complexity: python > bash > python.

chengzhuzhang commented 3 weeks ago

I've unfortunately been busy with other priorities, but I've been trying to follow this work at a high level. I would like to again raise my concern about repeating design decision mistakes that have caused issues in the global_time_series task. It may be a good idea to discuss design further before proceeding with more work.

I'm still not convinced that we should consider a stand-alone package for the global_time_series, mostly because the potential maintenance overhead. Also the global_time_series capability is very specific about its input: the global average files are generated with a relatively recent new NCO capability. It doesn't feel like we need a new package for just visualization global time-series and to aggregate the results.

My view of zppy is that it not only provide interface to connect multiple diagnostics tools, it also closes any gaps that these packages won't cover but E3SM requires. In the case of zppy-PMP workflow, there is missing pieces from PMP, including decoupled plotting capability and no viewer is generated. It feels that it is inevitable that we need to use zppy to close such gaps.

Another thing i would note is that there are several scripts use cdms or other CDAT utilities in this PR, which we should migrate away .

xylar commented 3 weeks ago

My view of zppy is that it not only provide interface to connect multiple diagnostics tools, it also closes any gaps that these packages won't cover but E3SM requires.

If that is going to continue to be the purview of zppy. I think there needs to be a yaml file or something similar that defines the dependencies and their constraints for each tool like global_time_series that has unique dependencies. I need somewhere I can go to know what packages E3SM-Unified needs to have to support not just zppy, the workflow management tool, but its pseudo-sub-packages.

chengzhuzhang commented 3 weeks ago

@xylar yes, dependencies such as numpy, and the upcoming outputviewer, should be explicitly listed in a yaml file.

rljacob commented 3 weeks ago

I think Ryan's point is that global_time_series is currently poorly maintained because its buried inside zppy. The cost of making it a separate package is offset by the benefit of better testing.

xylar commented 3 weeks ago

I agree with both @rljacob and @forsyth2, I don't think it's great if zppy becomes a kitchen-sink package. The extra maintenance is already there and breaking our distinct packages should provide transparency for users and developers. I realized we're short on staff. But bundling packages together doesn't really solve that problem, it just obscures it.

In general I think zppy should return to being a workflow manager and not try to also be the tool that adapts all other tools for E3SM's needs. That's really messy to try to bundle into a single package.

chengzhuzhang commented 3 weeks ago

hmmm, is it possible to make the global_time_series capability more modular and testable in zppy before considering packaging it as a stand-alone package? I don't know if such a package is useful beyond zppy.

zhangshixuan1987 commented 3 weeks ago

@chengzhuzhang @xylar @forsyth2: Hi all, I may not fully understand all your discussions here. The EnvironmentNameNotFound: Could not find conda environment: pcmdi_metrics_master is because the pcmdi_metrics_master is a customized environment used for the development of zppy-pcmdi workflow for the following reasons:

the PCMDI within the e3sm-unified environment is significantly outdated, and significant changes including the data structure etc. have been made in the latest version of the PCMDI package (https://github.com/PCMDI/pcmdi_metrics). Therefore, it is not valid to me if I develop the zppy-pcmdi workflow with respect to the outdated PCMDI library in the current e3sm-unified environment.
at the beginning of this zppy-pcmdi workflow, we believe that the update of PCMI within the e3sm-unified environment is not urgent and needed. Therefore, I built a PCMDI library in my local machine for the work.

To me, The above issue will not be a problem anymore once the PCMDI is updated in the e3sm-unified environment.

One step further, I currently used the ZPPY preferred method to load the conda environment pcmdi_metrics_master by adding following lines in the ".cfg" file:

environment_commands = "source /lcrc/soft/climate/e3sm-unified/load_latest_e3sm_unified_chrysalis.sh; conda activate pcmdi_metrics_master"

I think that @chengzhuzhang has a problem calling "conda activate pcmdi_metrics_master" as there is no such coda environment installed in her local machine. However, this is not because there is a module called "pcmdi_metrics_master" within any Python scripts of the zppy-pcmid workflow. My point here is that the above method to activate the customized PCMDI library will not cause any issues within zppy-pcmdi workflow.

xylar commented 3 weeks ago

@zhangshixuan1987, you are right. We are hijacking your PR to have what has meandered into a very important discussion but one that is not related.

xylar commented 3 weeks ago

Let's move the broader discussion to https://github.com/E3SM-Project/zppy/discussions/641 and keep this PR focused on its original purpose.

There is still, of course, room to discuss whether this work should be done outside of zppy for similar reasons to what moves over to that discussion.

chengzhuzhang commented 3 weeks ago

@zhangshixuan1987 Sorry for this distraction. I'm grateful that Xylar created a separate discussion item.

For the pcmdi_diags_master the environment. It is possible to source someone's locally built conda env through commands like: source /home/ac.forsyth2/miniconda3/etc/profile.d/conda.sh; conda activate e3sm_diags_20241015 This is how I source @forsyth2 's locally built e3sm_diags environment. I think we can do something similar for using your built pmp environment: first to source your conda path, and then activate the env.

zhangshixuan1987 commented 3 weeks ago

@chengzhuzhang @forsyth2: I am not intended to interrupt the discussions. I think all discussions here are very useful to me as well. I am also very interested in the further discussions at the https://github.com/E3SM-Project/zppy/discussions/641.

zhangshixuan1987 commented 3 weeks ago

@chengzhuzhang : Hi Jill, I have checked and made modifications to the PCMDI workflow. The test on my side indicates that the workflow can work as expected.

Below are the example results for the E3SM AMIP simulation run with revised workflow:

v3.LR.amip_0201

An updated ".cfg" file is attached here for your test:

post.v3.LR.amip.0201.cfg

Also, I think that my built "pcmdi_metrics_master" environment can not be sourced as like @xylar's likely because I did not install the miniconda environment by myself, rather, I relied on the "e3sm_unified" environment on Chrysalis. Therefore, to make sure that you can test the package successfully, I converted my package to a ".yml" file:

pcmdi_metrics_master.yml I think you can use this file to build an equivalent environment for test purposes (I apologize for the extra work on your side).

Finally, I am trying to work on a viewer function following the one in E3SM diagnostics, I will let you know if I can quickly figure this out. Please let me know if you could be available for a meeting, and I would like to discuss with you for the strategy and the requirements that I should follow.

E3SM-Project / zppy