claalmve commented 4 months ago

Description of PR

Making runtime calculations for WL's more visible and addressing small bugs.

Summary of changes and related issue Making WL modifications so that the notebooks run more clearly and do not break. These include:

Time index mismatching for warming levels due to leap days. This is addressed by dropping all leap days before concatenating DataArrays together.
WL visualize was crashing because no WRF models reached 4 degrees warming. Now this is fixed by returning an empty DataArray object if no simulations exist at that warming level. Additionally, the postage stamps generated will populate with a text saying: No simulations reach this degree of warming.
This PR plots a bar plot instead of an image plot for the postage stamps via wl.visualize IF the dimensions of the DataArray are too small (hvplot.image needs a minimum of 2x2 grid, when the data is smaller than that, a bar plot is generated instead).

What is not fixed (yet):

[x] Adding the overall runtime text to warming level and warming level approach notebooks
[x] Last plot overlay for warming_levels.ipynb is not totally functional. A lot of empty plots and overlaid text.
[ ] When creating a bar plot from wl.visualize(), the bar plot DOES NOT put simulation names in an interpretable fashion. I could not figure out how to get hvplot to work with me on this. UPDATE May 28: STATS plots generate, but they will also face the same problem as below with the overlaying simulation names and extra side panels.

[x] Still breaking in warming_levels_approach.ipynb due to the selected data being too small to compute .hvplot.images on in GCM_PostageStamps_STATS_compute.

Relevant motivation and context Want to implement minor upgrades for WL calculations so that it becomes more user-friendly.

Type of change

[x] Bug fix (non-breaking change which fixes an issue)
[ ] New feature (non-breaking change which adds functionality)
[ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)
[ ] This change requires a documentation update

How Has This Been Tested?

Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Tested in warming_levels_approach.ipynb.

Checklist:

[x] My code follows the style guidelines of this project
[x] I have performed a self-review of my code
[x] I have commented my code, particularly in hard-to-understand areas
[x] I have made corresponding changes to the documentation
[x] My changes generate no new warnings
[x] I have added tests that prove my fix is effective or that my feature works
[x] New and existing unit tests pass locally with my changes
[x] Any dependent changes have been merged and published in downstream modules

claalmve commented 4 months ago

This PR does modify data_load.py slightly. It adds an "intensive" if/else clause in load, which will print an extra statement after the ProgressBar if intensive is True. I added this because the Dask ProgressBars begin slowly and accelerate over time for large datasets or large compute, due to Dask needing to create the computation graph (slow) and then finding ways to parallelize the compute (fast). So, the ProgressBars by themselves may seem seem misleading since the first 1% may take 3 min but the last 1% may take half a second.

claalmve commented 4 months ago

Things to note about this PR:

PerformanceWarnings appear that didn't before. I traced it down to warming.py calling retrieve, meaning that these messages come up from Select. I am not sure why they do pop up for warming.py and not for Select itself.
wl.visualize() will break because of a ValueError when trying to plot a DataArray: ValueError: cannot convert float NaN to integer. I have tried debugging this, and gwl_snapshots all have valid, non-NaN data before plotting, so I am not sure why there is an error being thrown.

If anyone has thoughts or suggestions about these bugs/interesting findings, please feel free to change anything in the code or let me know what you think!

Tianchi-Liu commented 4 months ago

Seems this is already intended, but not reflected in load yet: per PR 346, 347, and 348, load shall be switched back to its original form.

Tianchi-Liu commented 4 months ago

warming_levels.ipynb is giving an error:

The code is having trouble with warming level 3.0 which some simulations don’t reach.

Full trace:

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
File /srv/conda/envs/notebook/lib/python3.9/site-packages/pandas/core/indexes/base.py:3800, in Index.get_loc(self, key, method, tolerance)
   3799try:
-> 3800return self._engine.get_loc(casted_key)
   3801exceptKeyErroras err:

File /srv/conda/envs/notebook/lib/python3.9/site-packages/pandas/_libs/index.pyx:138, in pandas._libs.index.IndexEngine.get_loc()

File /srv/conda/envs/notebook/lib/python3.9/site-packages/pandas/_libs/index.pyx:165, in pandas._libs.index.IndexEngine.get_loc()

File pandas/_libs/hashtable_class_helper.pxi:5745, in pandas._libs.hashtable.PyObjectHashTable.get_item()

File pandas/_libs/hashtable_class_helper.pxi:5753, in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: False

The above exception was the direct cause of the following exception:

KeyError                                  Traceback (most recent call last)
File <timed eval>:1

File ~/climakitae/climakitae/explore/warming.py:116, in WarmingLevels.calculate(self)
    112 self.gwl_snapshots = {}
    113for levelin tqdm(self.warming_levels, desc="Computing each warming level"):
    114     # Assign warming slices to dask computation graph
    115     warm_slice = load(
--> 116         self.find_warming_slice(level, self.gwl_times)  # , intensive=True
    117     )
    118     # Dropping simulations that only have NaNs
    119     warm_slice = warm_slice.dropna(dim="all_sims", how="all")

File ~/climakitae/climakitae/explore/warming.py:85, in WarmingLevels.find_warming_slice(self, level, gwl_times)
     82 warming_data = warming_data.assign_attrs(window=self.wl_params.window)
     84 # Cleaning data
---> 85 warming_data = clean_warm_data(warming_data)
     86 # Relabeling `all_sims` dimension
     87 new_warm_data = warming_data.drop("all_sims")

File ~/climakitae/climakitae/explore/warming.py:176, in clean_warm_data(warm_data)
    169 """
    170 Cleaning the warming levels data in 3 parts:
    171   1. Removing simulations where this warming level is not crossed. (centered_year)
    172   2. Removing timestamps at the end to account for leap years (time)
    173   3. Removing simulations that go past 2100 for its warming level window (all_sims)
    174 """
    175 # Cleaning #1
--> 176 warm_data = warm_data.sel(all_sims=~warm_data.centered_year.isnull())
    178 # Cleaning #2
    179 warm_data = warm_data.isel(
    180     time=slice(0, len(warm_data.time) - 1)
    181 )  # -1 is just a placeholder for 30 year window, this could be more specific.

File /srv/conda/envs/notebook/lib/python3.9/site-packages/xarray/core/dataarray.py:1420, in DataArray.sel(self, indexers, method, tolerance, drop, **indexers_kwargs)
   1310def sel(
   1311     self: T_DataArray,
   1312     indexers: Mapping[Any, Any] =None,
   (...)
   1316     **indexers_kwargs: Any,
   1317 ) -> T_DataArray:
   1318     """Return a new DataArray whose data is given by selecting index
   1319     labels along the specified dimension(s).
   1320   (...)
   1418     Dimensions without coordinates: points
   1419     """
-> 1420     ds = self._to_temp_dataset().sel(
   1421         indexers=indexers,
   1422         drop=drop,
   1423         method=method,
   1424         tolerance=tolerance,
   1425         **indexers_kwargs,
   1426     )
   1427return self._from_temp_dataset(ds)

File /srv/conda/envs/notebook/lib/python3.9/site-packages/xarray/core/dataset.py:2533, in Dataset.sel(self, indexers, method, tolerance, drop, **indexers_kwargs)
   2472 """Returns a new dataset with each array indexed by tick labels
   2473 along the specified dimension(s).
   2474   (...)
   2530 DataArray.sel
   2531 """
   2532 indexers = either_dict_or_kwargs(indexers, indexers_kwargs, "sel")
-> 2533 query_results = map_index_queries(
   2534     self, indexers=indexers, method=method, tolerance=tolerance
   2535 )
   2537if drop:
   2538     no_scalar_variables = {}

File /srv/conda/envs/notebook/lib/python3.9/site-packages/xarray/core/indexing.py:183, in map_index_queries(obj, indexers, method, tolerance, **indexers_kwargs)
    181         results.append(IndexSelResult(labels))
    182else:
--> 183         results.append(index.sel(labels, **options))  # type: ignore[call-arg]
    185 merged = merge_sel_results(results)
    187 # drop dimension coordinates found in dimension indexers
    188 # (also drop multi-index if any)
    189 # (.sel() already ensures alignment)

File /srv/conda/envs/notebook/lib/python3.9/site-packages/xarray/core/indexes.py:850, in PandasMultiIndex.sel(self, labels, method, tolerance)
    848if label_array.ndim == 0:
    849     label_value = as_scalar(label_array)
--> 850     indexer, new_index = self.index.get_loc_level(label_value, level=0)
    851     scalar_coord_values[self.index.names[0]] = label_value
    852elif label_array.dtype.kind == "b":

File /srv/conda/envs/notebook/lib/python3.9/site-packages/pandas/core/indexes/multi.py:3018, in MultiIndex.get_loc_level(self, key, level, drop_level)
   3015else:
   3016     level = [self._get_level_number(lev)for levin level]
-> 3018 loc, mi = self._get_loc_level(key, level=level)
   3019ifnot drop_level:
   3020if lib.is_integer(loc):

File /srv/conda/envs/notebook/lib/python3.9/site-packages/pandas/core/indexes/multi.py:3159, in MultiIndex._get_loc_level(self, key, level)
   3157return indexer, maybe_mi_droplevels(indexer, ilevels)
   3158else:
-> 3159     indexer = self._get_level_indexer(key, level=level)
   3160if (
   3161         isinstance(key, str)
   3162and self.levels[level]._supports_partial_string_indexing
   3163     ):
   3164         # check to see if we did an exact lookup vs sliced
   3165         check = self.levels[level].get_loc(key)

File /srv/conda/envs/notebook/lib/python3.9/site-packages/pandas/core/indexes/multi.py:3262, in MultiIndex._get_level_indexer(self, key, level, indexer)
   3258return slice(i, j, step)
   3260else:
-> 3262     idx = self._get_loc_single_level_index(level_index, key)
   3264if level > 0or self._lexsort_depth == 0:
   3265         # Desired level is not sorted
   3266if isinstance(idx, slice):
   3267             # test_get_loc_partial_timestamp_multiindex

File /srv/conda/envs/notebook/lib/python3.9/site-packages/pandas/core/indexes/multi.py:2848, in MultiIndex._get_loc_single_level_index(self, level_index, key)
   2846return -1
   2847else:
-> 2848return level_index.get_loc(key)

File /srv/conda/envs/notebook/lib/python3.9/site-packages/pandas/core/indexes/base.py:3802, in Index.get_loc(self, key, method, tolerance)
   3800return self._engine.get_loc(casted_key)
   3801exceptKeyErroras err:
-> 3802raiseKeyError(key)fromerr   3803exceptTypeError:
   3804     # If we have a listlike key, _check_indexing_error will raise
   3805     #  InvalidIndexError. Otherwise we fall through and re-raise
   3806     #  the TypeError.
   3807     self._check_indexing_error(key)

KeyError: False

claalmve commented 3 months ago

Seems this is already intended, but not reflected in load yet: per PR 346, 347, and 348, load shall be switched back to its original form.

Yes, it's still in this PR for ease-of-use, if/when this PR gets approved, I will remove that code before merging the PR into main.