NCAR / CUPiD

CUPiD is a “one stop shop” that enables and integrates timeseries file generation, data standardization, diagnostics, and metrics from all CESM components.
https://ncar.github.io/CUPiD/
Apache License 2.0
21 stars 19 forks source link

API updates to specify components #88

Closed rmshkv closed 2 months ago

rmshkv commented 3 months ago

Addresses https://github.com/NCAR/CUPiD/issues/63.

This adds a layer of keys in config.yml that groups notebooks (and scripts) by ESM component, e.g. atmosphere, ocean, etc. These keys are referenced by flags that can be passed in to cupid-run, e.g. -atm, -ocn, etc., as specified in the README. If no component flags are passed in, all components are run. This can also be done explicitly by using --all or -a.

This update also prompted changes to how we're managing the environment (kernel) checking step that was being done by util.get_control_dict() called by cupid-run. We still check the existence of all environments specified in config.yml (regardless of whether that component is turned on by flags), but just raise a warning if the environment does not exist. If that notebook is specified to be run, another warning is raised and that notebook is not run, but the others still are. Also, if neither a default_kernel_name nor a notebook-specific kernel_name is provided, cupid-analysis is assumed and another warning is raised.

rmshkv commented 3 months ago

Also, I don't love how the warnings show up currently - you end up with a large block of text mixed in with all the Ploomber stuff and the unhelpful warnings.warn()... code used to generate the warnings gets printed out too. Please feel free to propose a better way if you know one!

TeaganKing commented 3 months ago

Hi @rmshkv , I'll look at this PR in more detail, but wanted to post a quick note as I am realizing that there will probably be some conflicts between this PR and #78. I think that's fine, and I can update #78 if this comes in first. We may also want to discuss whether the component flags also result in timeseries being run for only the specified components (eg, if you run cupid-run config.yml -ts -atm, that would generate timeseries and run the notebooks for atm only). I think this may be ideal and still seems clear to me, but I'm open to other thoughts there.

rmshkv commented 3 months ago

Hi @rmshkv , I'll look at this PR in more detail, but wanted to post a quick note as I am realizing that there will probably be some conflicts between this PR and #78. I think that's fine, and I can update #78 if this comes in first. We may also want to discuss whether the component flags also result in timeseries being run for only the specified components (eg, if you run cupid-run config.yml -ts -atm, that would generate timeseries and run the notebooks for atm only). I think this may be ideal and still seems clear to me, but I'm open to other thoughts there.

Ah thanks for the heads up, I didn't realize how far along the timeseries generation work was! I think there are several different ways we could integrate these features...potentially even something like including the timeseries generation code for each component under compute_scripts, which already get turned on or off by the component flags but could then also be affected by the -ts flag? Maybe we can chat about it at the meeting tomorrow.

mnlevy1981 commented 2 months ago

Most of my comments on this pass-through are related to reducing the amount of duplicated code. There is still a lot of similarities between compute_notebooks and compute_scripts sections, but cleaning that up is probably best left for a separate issue

rmshkv commented 2 months ago

Hi @rmshkv , I'll look at this PR in more detail, but wanted to post a quick note as I am realizing that there will probably be some conflicts between this PR and #78. I think that's fine, and I can update #78 if this comes in first. We may also want to discuss whether the component flags also result in timeseries being run for only the specified components (eg, if you run cupid-run config.yml -ts -atm, that would generate timeseries and run the notebooks for atm only). I think this may be ideal and still seems clear to me, but I'm open to other thoughts there.

I just implemented this (-ts and component flags will only run those components). There's a tiny bit of extra code because the timeseries block references components as "atm", "ocn", etc and the notebooks/scripts blocks reference them as "atmosphere", "ocean", etc but it works fine and we can address it later, in the interest of getting this PR in.

rmshkv commented 2 months ago

Hi @rmshkv , I'll look at this PR in more detail, but wanted to post a quick note as I am realizing that there will probably be some conflicts between this PR and #78. I think that's fine, and I can update #78 if this comes in first. We may also want to discuss whether the component flags also result in timeseries being run for only the specified components (eg, if you run cupid-run config.yml -ts -atm, that would generate timeseries and run the notebooks for atm only). I think this may be ideal and still seems clear to me, but I'm open to other thoughts there.

I just implemented this (-ts and component flags will only run those components). There's a tiny bit of extra code because the timeseries block references components as "atm", "ocn", etc and the notebooks/scripts blocks reference them as "atmosphere", "ocean", etc but it works fine and we can address it later, in the interest of getting this PR in.

On second thought, maybe we do want them to match...I'll make everything use the short names, and we can change that later if we decide to.

mnlevy1981 commented 2 months ago

Hi @rmshkv , I'll look at this PR in more detail, but wanted to post a quick note as I am realizing that there will probably be some conflicts between this PR and #78. I think that's fine, and I can update #78 if this comes in first. We may also want to discuss whether the component flags also result in timeseries being run for only the specified components (eg, if you run cupid-run config.yml -ts -atm, that would generate timeseries and run the notebooks for atm only). I think this may be ideal and still seems clear to me, but I'm open to other thoughts there.

I just implemented this (-ts and component flags will only run those components). There's a tiny bit of extra code because the timeseries block references components as "atm", "ocn", etc and the notebooks/scripts blocks reference them as "atmosphere", "ocean", etc but it works fine and we can address it later, in the interest of getting this PR in.

On second thought, maybe we do want them to match...I'll make everything use the short names, and we can change that later if we decide to.

If you haven't made the change yet, I'd prefer changing the time series to use the longer names... but if you are already using the short names we can clean that up in a future PR :)

rmshkv commented 2 months ago

Hi @rmshkv , I'll look at this PR in more detail, but wanted to post a quick note as I am realizing that there will probably be some conflicts between this PR and #78. I think that's fine, and I can update #78 if this comes in first. We may also want to discuss whether the component flags also result in timeseries being run for only the specified components (eg, if you run cupid-run config.yml -ts -atm, that would generate timeseries and run the notebooks for atm only). I think this may be ideal and still seems clear to me, but I'm open to other thoughts there.

I just implemented this (-ts and component flags will only run those components). There's a tiny bit of extra code because the timeseries block references components as "atm", "ocn", etc and the notebooks/scripts blocks reference them as "atmosphere", "ocean", etc but it works fine and we can address it later, in the interest of getting this PR in.

On second thought, maybe we do want them to match...I'll make everything use the short names, and we can change that later if we decide to.

If you haven't made the change yet, I'd prefer changing the time series to use the longer names... but if you are already using the short names we can clean that up in a future PR :)

I made the change just a minute ago, but it's not too hard to change back later. I also prefer the long names for clarity, but I don't know much about what the timeseries code is doing and it looks like some of the directories it's creating might depend on the component shortname, so I didn't want to touch that myself.

rmshkv commented 2 months ago

All right, I think that addresses everything requested! Feel free to run some tests again and let me know what you think.

TeaganKing commented 2 months ago

Hi @rmshkv , I'll look at this PR in more detail, but wanted to post a quick note as I am realizing that there will probably be some conflicts between this PR and #78. I think that's fine, and I can update #78 if this comes in first. We may also want to discuss whether the component flags also result in timeseries being run for only the specified components (eg, if you run cupid-run config.yml -ts -atm, that would generate timeseries and run the notebooks for atm only). I think this may be ideal and still seems clear to me, but I'm open to other thoughts there.

I just implemented this (-ts and component flags will only run those components). There's a tiny bit of extra code because the timeseries block references components as "atm", "ocn", etc and the notebooks/scripts blocks reference them as "atmosphere", "ocean", etc but it works fine and we can address it later, in the interest of getting this PR in.

On second thought, maybe we do want them to match...I'll make everything use the short names, and we can change that later if we decide to.

If you haven't made the change yet, I'd prefer changing the time series to use the longer names... but if you are already using the short names we can clean that up in a future PR :)

I made the change just a minute ago, but it's not too hard to change back later. I also prefer the long names for clarity, but I don't know much about what the timeseries code is doing and it looks like some of the directories it's creating might depend on the component shortname, so I didn't want to touch that myself.

Just as an FYI, the time series code isn't super dependent on how the components are named; There's a comment in line 285 that would need to be updated, as well as just the names in lines 329, 52, and 232. I'm fine with keeping the short names for now, though.

TeaganKing commented 2 months ago

All the tests I've run are looking good! I do also agree that updated the logic regarding 'all' is improved when the assumption is that 'all' is the default unless one component is specified, and a flag is not needed to specify 'all'. I think that the lack of a flag clarifies that 'all' is default instead of potential confusion with a default that can also be specified with a flag.

mnlevy1981 commented 2 months ago

The code in run.py looks great and cupid-run config.yml generates output as expected, but cupid-build isn't getting the sidebar right anymore:

cupid book

I was hoping the sidebar would have the name of all the components, and then links to each of the component's notebooks underneath.

I really like how this turned out:

(cupid-dev) examples/coupled_model$ cupid-run -atm -glc config.yml
cupid/run.py:152: UserWarning: No notebooks for glc component specified in config file.

(And adf_quick_run.ipynb was still executed).

rmshkv commented 2 months ago

The code in run.py looks great and cupid-run config.yml generates output as expected, but cupid-build isn't getting the sidebar right anymore:

cupid book

I was hoping the sidebar would have the name of all the components, and then links to each of the component's notebooks underneath.

I really like how this turned out:

(cupid-dev) examples/coupled_model$ cupid-run -atm -glc config.yml
cupid/run.py:152: UserWarning: No notebooks for glc component specified in config file.

(And adf_quick_run.ipynb was still executed).

My bad, I just forgot to change the explicit paths in the jupyter book config with the new short folder names. Should be fixed now!

mnlevy1981 commented 2 months ago

My bad, I just forgot to change the explicit paths in the jupyter book config with the new short folder names. Should be fixed now!

After updating to f114b83 I needed to rerun cupid-run and cupid-build, but the page looks great now!

cupid book 2