Closed nmizukami closed 1 week ago
Hey @nmizukami ! Thanks so much for adding these notebooks! I made those changes we discussed to cupid-run
and the config file to include rof
in running cupid and in the jupyter book table of contents.
One note: I think we'll need to provide math
in cupid-analysis
if you prefer to use that for sqrt
rather than numpy
in metrics.py
? I don't think this should be a problem, but just let me know if you'd like me to do that?
Would you be able to pull these changes in locally, test running your notebook with cupid-run
and make sure that things look as you expect?
Hi Teagan (@TeaganKing), The notebook almost ran with cupid-run -rof
. One error was reading geopackage file (gis vector data) via geopanda.
You can see /glade/work/mizukami/CUPiD/examples/coupled_model/month_annual_flow.ipynb.
cupid prints on screen this:
RROR 1: PROJ: proj_create_from_database: /glade/u/apps/casper/23.10/spack/opt/spack/proj/8.2.1/gcc/12.2.0/7gif/share/proj/proj.db contains DATABASE.LAYOUT.VERSION.MINOR = 2 whereas a number >= 3 is expected. It comes from another PROJ installation.
I have seen this before. I don't fully understand the error, but this is coming from pyogrio
package that came with geopandas
. pyogrio is used internally in geopandas.
When I ran the notebook outside cupid, it runs fine but I activated Python [conda-env:cupid-analysis]
environment in the jupyterhub. I see another one called cupid-analysis
, which I believe cupid actually uses. I saw similar error when I use cupid-analysis
. wondering what is the difference between Python [conda-env:cupid-analysis]
and cupid-analysis
?
hopefully I am simply setting something e.g., environment incorrectly...
Hi @nmizukami , Sorry I let this slip! In the environment in which this was working, did you have a particular version of geopandas
or pyogrio
pinned? I could also add that to the environment yaml specification. Or when you previously ran into this error, did you have another solution?
This error may be because PROJ is already installed-- I'm not sure where at this point, but can look into that.
Hi @TeaganKing, some hint is that I can ran outside cupid-run, meaning I can run the notebook manually on jupyterhub with [conda-env:cupid-analysis]
env on, but NOT with cupid-analysis
on (get similar error on PROJ). You see two similar envs in Jupyter in image below. I believe the package versions should be ok. I can think about this more... I don't know what is the difference between [conda-env:cupid-analysis]
and cupid-analysis
It sounds like there may be some issue related to the ipykernel installation. I think one of these might be the installation from ipykernel (a soft linked conda environment) and the other may be a conda environment found elsewhere (possibly an outdated cupid-analysis
that doesn't include geopandas
?). Mike mentioned that the ipykernel installation basically creates a softlink to an environment, which made me think that could be an inconsistency.
I had updated a test environment but not my actual cupid-analysis environment; I'm doing that now and will test your notebook out. This is probably not the most efficient workflow, but I wonder if it might also be worth removing your cupid-analysis environment, see if it's still listed as an option in JupyterHub, make sure that both versions are removed, and then re-install a clean version?
I did the following steps to remove cupid-analysis env and reinstall it on terminal.
mamba remove --name cupid-analysis --all
mamba env create -f environments/cupid-analysis.yml
It did not fix it. After removing cupid-analysis, jupyterhub still showed cupid-analysis, though [conda-env:cupid-analysis] was gone.
Hi @TeaganKing , trying to run conda list
to see what packages are there in cupid-analysis env when running cupid-run. So unfortunately including conda list
in the notebook cause error in cupid run:
SyntaxError: An error happened when checking the source code.
:25:7: invalid syntax
conda list
casper-login1:/glade/work/mizukami/CUPiD/examples/coupled_model (main_adding_rof)> cupid-run -rof
/glade/work/mizukami/conda-envs/cupid-dev/lib/python3.11/site-packages/ploomber/dag/dag.py:455: UserWarning:
========================================================================================= DAG render with warnings =========================================================================================
----------------------------------------------------------------- NotebookRunner: index -> File('computed_notebook...ucture/index.ipynb') ------------------------------------------------------------------
----------------------------------------------------------------- /glade/work/mizukami/CUPiD/examples/nblibrary/infrastructure/index.ipynb -----------------------------------------------------------------
These parameters are not used in the task's source code: 'CESM_output_dir', 'lc_kwargs', 'serial', and 'subset_kwargs'
----------------------------------------------------------- NotebookRunner: month_annual_flow -> File('computed_notebook..._annual_flow.ipynb') ------------------------------------------------------------
---------------------------------------------------------------- /glade/work/mizukami/CUPiD/examples/nblibrary/rof/month_annual_flow.ipynb -----------------------------------------------------------------
These parameters are not used in the task's source code: 'CESM_output_dir', 'lc_kwargs', 'serial', and 'subset_kwargs'
============================================================================================ Summary (2 tasks) =============================================================================================
NotebookRunner: index -> File('computed_notebook...ucture/index.ipynb')
NotebookRunner: month_annual_flow -> File('computed_notebook..._annual_flow.ipynb')
========================================================================================= DAG render with warnings =========================================================================================
warnings.warn(str(warnings_))
Executing: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:02<00:00, 1.03cell/s]
Building task 'month_annual_flow': 50%|███████████████████████████████████████████████████████████████████ | 1/2 [00:02<00:02, 2.92s/itERROR 1: PROJ: proj_create_from_database: /glade/u/apps/casper/23.10/spack/opt/spack/proj/8.2.1/gcc/12.2.0/7gif/share/proj/proj.db contains DATABASE.LAYOUT.VERSION.MINOR = 2 whereas a number >= 3 is expected. It comes from another PROJ installation.
/glade/u/apps/opt/conda/condabin/conda | 5/69 [00:20<03:39, 3.44s/cell]
Executing: 90%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▍ | 62/69 [03:53<00:26, 3.76s/cell]
Building task 'month_annual_flow': 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [03:56<00:00, 118.06s/it]
Traceback (most recent call last):
File "/glade/work/mizukami/conda-envs/cupid-dev/bin/cupid-run", line 8, in <module>
sys.exit(run())
^^^^^
File "/glade/work/mizukami/conda-envs/cupid-dev/lib/python3.11/site-packages/click/core.py", line 1157, in __call__
return self.main(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/glade/work/mizukami/conda-envs/cupid-dev/lib/python3.11/site-packages/click/core.py", line 1078, in main
rv = self.invoke(ctx)
^^^^^^^^^^^^^^^^
File "/glade/work/mizukami/conda-envs/cupid-dev/lib/python3.11/site-packages/click/core.py", line 1434, in invoke
return ctx.invoke(self.callback, **ctx.params)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/glade/work/mizukami/conda-envs/cupid-dev/lib/python3.11/site-packages/click/core.py", line 783, in invoke
return __callback(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/glade/work/mizukami/CUPiD/cupid/run.py", line 290, in run
dag.build()
File "/glade/work/mizukami/conda-envs/cupid-dev/lib/python3.11/site-packages/ploomber/dag/dag.py", line 557, in build
report = callable_()
^^^^^^^^^^^
File "/glade/work/mizukami/conda-envs/cupid-dev/lib/python3.11/site-packages/ploomber/dag/dag.py", line 662, in _build
raise build_exception
File "/glade/work/mizukami/conda-envs/cupid-dev/lib/python3.11/site-packages/ploomber/dag/dag.py", line 591, in _build
task_reports = self._executor(dag=self, show_progress=show_progress)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/glade/work/mizukami/conda-envs/cupid-dev/lib/python3.11/site-packages/ploomber/executors/serial.py", line 203, in __call__
raise DAGBuildError(str(exceptions_all))
ploomber.exceptions.DAGBuildError:
============================================================================================= DAG build failed =============================================================================================
----------------------------------------------------------- NotebookRunner: month_annual_flow -> File('computed_notebook..._annual_flow.ipynb') ------------------------------------------------------------
---------------------------------------------------------------- /glade/work/mizukami/CUPiD/examples/nblibrary/rof/month_annual_flow.ipynb -----------------------------------------------------------------
---------------------------------------------------------------------------
Exception encountered at "In [24]":
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[24], line 2
1 column_stat = []
----> 2 gauge_shp_all_case = gauge_shp.copy(deep=True)
3 for case, grid_name in cases.items():
4 gauge_shp_all_case = gauge_shp_all_case.merge(
5 gauge_shp1[case][["id", f"{error_metric}_{grid_name}"]],
6 left_on="id",
7 right_on="id",
8 )
NameError: name 'gauge_shp' is not defined
ploomber.exceptions.TaskBuildError: Error when executing task 'month_annual_flow'. Partially executed notebook available at /glade/work/mizukami/CUPiD/examples/coupled_model/computed_notebooks/quick-run/rof/month_annual_flow.ipynb
ploomber.exceptions.TaskBuildError: Error building task "month_annual_flow"
============================================================================================= Summary (1 task) =============================================================================================
NotebookRunner: month_annual_flow -> File('computed_notebook..._annual_flow.ipynb')
============================================================================================= DAG build failed =============================================================================================
Hi @TeaganKing,
Small good new is that I got it run without the geopanda error. The trick is to add this
os.environ['PROJ_LIB']='/glade/work/mizukami/conda-envs/cupid-analysis/share/proj'
before loading geopandas.
However, I don't think this is permanent solution. I still try to consult with CISL.
I was able to create /glade/work/mizukami/CUPiD/examples/coupled_model/computed_notebooks/quick-run/_build/html/index.html How do you usually open under HPC. I was trying to open firefox in derecho/casper, but it is very slow. Wonder if there is any other ways to look.
Hi @nmizukami , I'm glad that is temporarily working (but of course we need this to work for any user's environment). Yes, I think this would be a good conversation to have with CISL.
Regarding looking at output, see the second section on this page for recommendations on NCAR machines.
Hey @nmizukami , I added a PR to bring rof
into run.py
. And then I realized these changes are already in this PR... so apologies-- feel free to ignore that!
To-do:
-rof, --river-runoff Run river runoff component diagnostics
Per conversation with @nmizukami , this is now working when he cloned the cupid package, installed both of the python environments, and ran create_conda_kernel
under cupid-analysis, instead of python -m ipykernel install --user --name=cupid-analysis
.
I also updated the readme to include rof
and fixed a number of issues from pre-commit checks.
Could you please fix these two errors from pre-commit checks in examples/nblibrary/rof/scripts/metrics.py
? If metrics.py is not currently being used, it would also be great to either 1) remove this script and add it in when the notebooks that are going to use it are included in CUPiD, OR 2) fix the scripts to make sure they work and include them in this PR.
examples/nblibrary/rof/scripts/metrics.py:363:26: F821 undefined name 'myCount'
examples/nblibrary/rof/scripts/metrics.py:424:26: F821 undefined name 'myCount'
After updating my conda environment, I ran into the following rof-related errors. I'll plan to look into these issues in more detail (and do a more thorough review of the notebooks + scripts) once jupyterhub is back up. It seems like they could be a permission-related error where the dictionaries are not being properly populated and thus the keys are incorrect.
--------------------------------------------------------------------------- NotebookRunner: month_annual_flow -> File('computed_notebook..._annual_flow.ipynb') ----------------------------------------------------------------------------
-------------------------------------------------------------- /glade/work/tking/cupid_project/other_cupids/naoki_cupid/CUPiD/examples/nblibrary/rof/month_annual_flow.ipynb ---------------------------------------------------------------
---------------------------------------------------------------------------
Exception encountered at "In [12]":
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
Cell In[12], line 6
3 gauge_plot[gname] = {}
4 for case, _ in cases.items():
5 gauge_ix, seg_index, seg_id = find_seg_id(
----> 6 site_id, reachID[case], gauge_reach_lnk[case]
7 )
8 gauge_plot[gname][case] = [gauge_ix, seg_index, seg_id]
KeyError: 'f09_f09_mg17_mosart'
ploomber.exceptions.TaskBuildError: Error when executing task 'month_annual_flow'. Partially executed notebook available at /glade/work/tking/cupid_project/other_cupids/naoki_cupid/CUPiD/examples/coupled_model/computed_notebooks/quick-run/rof/month_annual_flow.ipynb
ploomber.exceptions.TaskBuildError: Error building task "month_annual_flow"
---------------------------------------------------------------------------- NotebookRunner: ocean_discharge -> File('computed_notebook...an_discharge.ipynb') -----------------------------------------------------------------------------
--------------------------------------------------------------- /glade/work/tking/cupid_project/other_cupids/naoki_cupid/CUPiD/examples/nblibrary/rof/ocean_discharge.ipynb ----------------------------------------------------------------
---------------------------------------------------------------------------
Exception encountered at "In [16]":
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
Cell In[16], line 34
32 id_list = gauge_shp1[case]["route_id"].values
33 else:
---> 34 id_list = gauge_shp1[case][gauge_shp1[case]["ocean"] == ocean_name][
35 "route_id"
36 ].values
38 reach_index = get_index_array(reachID[case], id_list)
39 seas_data_vector = seas_data[case][q_name].stack(seg=("lat", "lon"))
KeyError: 'f09_f09_mg17_mosart'
did I mess this up? I was rebasing NCAR/main and also nmizukami/main_adding_rof to get my local to up-to-date.
Thanks for fixing that! These are now running smoothly on my end with those updated permissions!
In month_annual_flow.ipynb, I also have a few additional comments (but the file was too large to render/comment on particular lines): All sections: Can you please remove empty cells? I think the ‘go back to top’ sections could be removed if this is being run all at once?
Removed empty cells and removed ‘go back to top’ link.
Section 1. Are the lists of cases intended to be regularly used options? Is the yaml file that’s loaded going to be consistent, or is it intended to be updated by user?
Yes, case is CESM case, which CESM modeler put and this has to be provided by the user. I tried adding some comments what need to be provided here. Multiple cases are allowed to compare the results from different cases
2.1 – there’s a typo ‘Mmonthly’
Fixed.
2.6 - can you add a docstring to define what’s happening in this function?
This function is removed.
3.1 – I think this cell may have an empty slice error.
I think I know how this happens. I think this is due to the inconsistent observation data length across the sites (some sites provide the data for a short period). No observed flow data is available depending analysis (or simulation) periods. Can I fix this in another PR?
Thanks for addressing some of these comments!
I made issue ticket #140 for the empty slice error-- that's fine by me to address in another PR as long as the notebook otherwise runs smoothly (which it does) and users are informed (by the issue ticket) of the error that needs to be fixed.
Ok, I wanted to make sure that we are using the cases
values in the config.yml
file and that users don't actually need to update anything in the notebook once the cell is tagged as a parameters cell.
I started to look at this, and I have a lot of questions and suggestions -- @nmizukami and @TeaganKing could we try to find a time early next week to meet? I'd like to discuss a few things that might be tough to squeeze into an in-line review on this PR. Some initial comments, though (and maybe these changes will make it a little easier to go back through line-by-line asynchronously):
month_annual_flow.ipynb
you have a logical flag parallel
that enables using PBSCluster
when set to true. If you look at examples/nblibrary/ocn/ocean_surface.ipynb
you'll see how CUPiD already passes a serial
flag and uses a LocalCluster
when that is set to false... I haven't looked at the other runoff notebooks, but we need to avoid casper- or derecho-specific blocks of codeexamples/nblibrary/rof
should not have any output in themHi Mike (@mnlevy1981) and Teagan (@TeaganKing), yes, I can meet on Monday or Tuesday (my calendar is up-to date). I was wondering about PBSCluster (2nd point)
To-do
PBSCluster
with LocalCluster
, then follow the example in ocn notebook (use serial
logical flag)Updated key_metrics/config.yml and coupled_model/config.yml for rof modify two notebooks based on config changes so now they run.
Review is needed and some science questions came up (e.g., what to do if you plot for time period when no observation is available. Are the other notebooks comparing the model outputs with observations??)
Hey @nmizukami , thanks for these updates.
Not all notebooks are comparing with observations, but you can see an example of an observational comparison in the glacier notebook & corresponding config.yml details.
I think that if you are plotting for a time period where observations are not available, perhaps a warning statement that the obs are unavailable would be useful?
And I'll review after our discussion on Thursday.
Right now I am pointing to case /glade/campaign/cesm/development/cross-wg/diagnostic_framework/CESM_output_for_testing/b.e23_alpha16b.BLT1850.ne30_t232.054
The time period for this case is year 0001-0102, when for sure there is no observation for any components. So I thought this config is meant to compare the simulation with some base simulation, to see the model comparison or something like that, not meant to validate the model component with observations.
Just wanted to understand the context of this setup. just with current config, the rof notebooks look less interesting, but technically the notebook works now (I believe).
If config point to any CESM cases that use 20th-21st century, rof notebook automatically adds the observed streamflow to the plots, and compare the simulations with observations.
The key setup here that's different from the coupled-model example is in the 'global params' section of the config file, where we have both a case name for the case you're looking at, as well as the base_case_name for a comparison case. The observations are defined separately in each individual notebook config section at this point.
That sounds good that plots are generated without obs if obs do not exist.
@nmizukami is planning to do the following:
Once these items are done, @TeaganKing can review.
@nmizukami is planning to do the following:
- [x] implement comparison with base_case in addition to obs
- [x] remove years that overwrite config file start/end years
- [x] include analysis-period configuration parameter to specify e.g. 10 years so that users don't need to run 100 years unless they really want to do so.
- [x] test cupid-run from key_metrics directory
Once these items are done, @TeaganKing can review.
Hi @TeaganKing, all are done!
Hey @nmizukami , thanks for all your work on this! It's definitely getting much closer!
I made a few in-line comments above and also wanted to remind you of @mnlevy1981 's comment from above: "I noticed in month_annual_flow.ipynb you have a logical flag parallel that enables using PBSCluster when set to true. If you look at examples/nblibrary/ocn/ocean_surface.ipynb you'll see how CUPiD already passes a serial flag and uses a LocalCluster when that is set to false... I haven't looked at the other runoff notebooks, but we need to avoid casper- or derecho-specific blocks of code". This same structure is also currently used in ocean_discharge.ipynb
. You can see details on the current serial vs parallel implementation in our documentation.
Also, this is a minor comment, but would you be able to rename your notebooks according to our new suggested naming convention: <region>_<variable>_<metric>_<comparisons>.ipynb
(eg, something similar to Global_PSL_NMSE_compare_obs_lens.ipynb
or Greenland_SMB_visual_compare_obs.ipynb
)
It looks like the notebooks ran in key_metrics
in about 15 minutes. Both notebooks seem to be running smoothly. One additional comment regarding the figures-- in month_annual_flow.ipynb
, there are a number of figures that are saved but not directly plotted in the notebook. For the purposes of generating the juptyerbook, I think it'd be best to show the plots directly in the notebook.
In the coupled_model
example, in Cell 5, there was an error with no files to open. It may be best to leave the key_metrics
example, and take out rof
from the coupled_model
example since we'll need to update that configuration file anyways and create an 'additional metrics' example.
Hey @nmizukami , thanks for all your work on this! It's definitely getting much closer!
I made a few in-line comments above and also wanted to remind you of @mnlevy1981 's comment from above: "I noticed in month_annual_flow.ipynb you have a logical flag parallel that enables using PBSCluster when set to true. If you look at examples/nblibrary/ocn/ocean_surface.ipynb you'll see how CUPiD already passes a serial flag and uses a LocalCluster when that is set to false... I haven't looked at the other runoff notebooks, but we need to avoid casper- or derecho-specific blocks of code". This same structure is also currently used in
ocean_discharge.ipynb
. You can see details on the current serial vs parallel implementation in our documentation.
I believe I have upadted both notebooks so they use LocaCluster now like this. Are you looking at the old ones??
client = None
if serial:
cluster = LocalCluster(**lc_kwargs)
client = Client(cluster)
client
Also, this is a minor comment, but would you be able to rename your notebooks according to our new suggested naming convention:
<region>_<variable>_<metric>_<comparisons>.ipynb
(eg, something similar toGlobal_PSL_NMSE_compare_obs_lens.ipynb
orGreenland_SMB_visual_compare_obs.ipynb
)
How about these?
month_annual_flow.ipynb -> global_discharge_gauge_comparison.ipynb
ocean_discharge.ipynb -> global_discharge_ocean_comparison.ipynb
For comparison part, it plots observations, but it is not available (because of analysis year), it is not plotted.
what does
It looks like the notebooks ran in
key_metrics
in about 15 minutes. Both notebooks seem to be running smoothly. One additional comment regarding the figures-- inmonth_annual_flow.ipynb
, there are a number of figures that are saved but not directly plotted in the notebook. For the purposes of generating the juptyerbook, I think it'd be best to show the plots directly in the notebook.
15min-> did you run all the components? It seems to take too long. I put a 10 year period (rof_start_data and rof_end_date) in config.yml now, so should be read faster and reading netcdfs is a bottleneck now.
name Ran? Elapsed (s) Percentage
----------------- ------ ------------- ------------
index True 9.39778 10.1186
month_annual_flow True 55.4196 59.6702
ocean_discharge True 28.0591 30.2112
Regarding no inline figures from cells - actually those cells are not running without observations available. I was not sure what to do with this situation. If there are observations, they should run and produce the figures.
In the
coupled_model
example, in Cell 5, there was an error with no files to open. It may be best to leave thekey_metrics
example, and take outrof
from thecoupled_model
example since we'll need to update that configuration file anyways and create an 'additional metrics' example.
Thanks for clarifying-- the way you are using LocalCluster works!
For the notebook names, would these work? Or if you prefer, maybe you could just take out _compare_obs
if observations are only available sometimes?
month_annual_flow.ipynb -> global_discharge_gauge_compare_obs.ipynb
ocean_discharge.ipynb -> global_discharge_ocean_compare_obs.ipynb
RE plotting in line-- I think that's fine if the plots show up when observations are available. Since the default is save_figs=False, I think this is reasonable.
Did you want to take out the coupled_model config file changes, per our discussion on updating the config file to a similar format as the key_metrics config file?
Thanks for making these changes, @nmizukami ! These notebook names are clearer now. I also removed the jupyter book build bit for the rof notebooks that are no longer being computed in the coupled model config file. Both notebooks run on the order of 1min now!
I think the only outstanding comment I'm concerned about is the inclusion of personal directories-- which may need to be pushed to a future issue ticket when we have a more clear location for hosting observational datasets. @mnlevy1981 did you want to take a quick look at this, as well?
I spent some time trying to reduce the number of lines in
metrics.py
before realizing you never import it.
Thanks for looking at the scripts. Actually it is not imported. I was going to use Cell [19]
where bias, RMSE, and corr are computed in global_discharge_gauge_compare_obs.ipynb
.
I put this metrics.py even though they are not used. This computes error metrics, also streamflow metrics (high flow, low flow extreme events etc.), which might be useful (at least use for hydrology, river often).
I can leave this out if confusing now, and add it when (if) need later.
Also,
global_discharge_gauge_compare_obs.ipynb
ran successfully but
- Cell
[14]
(in3.2. Annual cycle (at monthly step)
) is the last cell that produces output; cells 15 - 21 have the%%time
output but no plots or tables
Yes, Cell 15 through 21 are skipped if there are no observations available. For this case, the simulation period year 0001 - 0101, so it is outside the time period when observation is available. these cells compute error against observations.
in those cells, the second line if obs_available:
-> obs_available
is False if there is no observation available.
Did some metadata cells get moved around? I see the following sections in order:
3. Analysis for 24 large rivers 3.1 Annual flow series 3.2. Annual cycle (at monthly step) 3.3. scatter plots of monthly flow 3.4. scatter plots of annual flow 4. Anaysis for Large 50 rivers 4.1 Summary tables 4.2. scatter plot of annual mean flow 3. Anaysis for all 922 sites 3.1 Compute metris at all the sites (no plots nor tables) 4.2. Spatial metric map 4.3 Boxplots of Error metrics (RMSE, %bias, and correlation coefficient)
Yes, this needs to be updated. But i have been wondering if this is needed?? Also right now each section has a link to the corresponding cell. I don't think the link is not useful here?
(Maybe the reason I could delete
metrics.py
is because the cells that should call those functions aren't being invoked correctly?)I haven't had a chance to look at
colors.py
orutility.py
yet, and if it turns outmetrics.py
is supposed to be called I can go look closer to see if more functions can be replaced with existing numpy calls
utility.py and colors.py are imported (but not all the functions), but metrics.py is not called.
So what I could do is to include the functions used only? These scrips are from the scripts I use for my other notebooks, so if these become useful, could be added later.
2. Did some metadata cells get moved around? I see the following sections in order:
3. Analysis for 24 large rivers 3.1 Annual flow series 3.2. Annual cycle (at monthly step) 3.3. scatter plots of monthly flow 3.4. scatter plots of annual flow 4. Anaysis for Large 50 rivers 4.1 Summary tables 4.2. scatter plot of annual mean flow 3. Anaysis for all 922 sites 3.1 Compute metris at all the sites (no plots nor tables) 4.2. Spatial metric map 4.3 Boxplots of Error metrics (RMSE, %bias, and correlation coefficient)
Yes, this needs to be updated. But i have been wondering if this is needed?? Also right now each section has a link to the corresponding cell. I don't think the link is not useful here?
I would definitely encourage you to have some markdown cells that describe what plots are being generated. I don't think they need to be organized like a paper with sections and subsections. I hadn't noticed the links from the top of the notebook to each specific section until you mentioned it... let me look at how cupid-build
parses those links before we discuss keeping them or removing them.
utility.py and colors.py are imported (but not all the functions), but metrics.py is not called.
So what I could do is to include the functions used only? These scrips are from the scripts I use for my other notebooks, so if these become useful, could be added later.
Yes, please remove functions that are not being used (and then I'm happy to review whatever is left). One reason to provide modules like utility.py
and colors.py
is to potentially share functions with other notebooks, but if these modules contain code that isn't used anywhere then there is no guarantee that they actually work as advertised.
- Did some metadata cells get moved around? I see the following sections in order:
- Analysis for 24 large rivers 3.1 Annual flow series 3.2. Annual cycle (at monthly step) 3.3. scatter plots of monthly flow 3.4. scatter plots of annual flow
- Anaysis for Large 50 rivers 4.1 Summary tables 4.2. scatter plot of annual mean flow
- Anaysis for all 922 sites 3.1 Compute metris at all the sites (no plots nor tables) 4.2. Spatial metric map 4.3 Boxplots of Error metrics (RMSE, %bias, and correlation coefficient)
Yes, this needs to be updated. But i have been wondering if this is needed?? Also right now each section has a link to the corresponding cell. I don't think the link is not useful here?
I would definitely encourage you to have some markdown cells that describe what plots are being generated. I don't think they need to be organized like a paper with sections and subsections. I hadn't noticed the links from the top of the notebook to each specific section until you mentioned it... let me look at how
cupid-build
parses those links before we discuss keeping them or removing them.utility.py and colors.py are imported (but not all the functions), but metrics.py is not called. So what I could do is to include the functions used only? These scrips are from the scripts I use for my other notebooks, so if these become useful, could be added later.
Yes, please remove functions that are not being used (and then I'm happy to review whatever is left). One reason to provide modules like
utility.py
andcolors.py
is to potentially share functions with other notebooks, but if these modules contain code that isn't used anywhere then there is no guarantee that they actually work as advertised.
Ok, I will clean up the scripts (sorry for confusing you.) and the markdown cells.
Hi @mnlevy1981, I think the scripts are clean now, and fixed the errors you pointed out in the notebooks. Also markdown cells are update. The notebooks still runs ok, though I saw lots of exceptions from the dask distributed.
Once the comments regarding ancillary data variable naming and personal directories are resolved, we can merge this.
Hi @TeaganKing and @mnlevy1981 , I have updated the notebooks and setup.yaml based on your comments. thanks!
Great, thanks so much @nmizukami ! This looks good to me. I think merging is blocked due to Mike's change request-- @mnlevy1981 can you please check that this addressed your comments as well and then merge this in?
Initial commits for ROF notebooks.
An ultimate set of the notebooks intend to mimic old ROF diagnostic plots
This PR is just starting with a few notebooks.
All Submissions:
pre-commit
check)?New Feature Submissions:
Changes to Core Features: