cal-adapt / climakitae

A Python toolkit for retrieving, visualizing, and performing scientific analyses with data from the Cal-Adapt Analytics Engine.
https://climakitae.readthedocs.io
BSD 3-Clause "New" or "Revised" License
19 stars 2 forks source link

Increasing visibility to WL and addressing small bugs #343

Closed claalmve closed 3 months ago

claalmve commented 4 months ago

Description of PR

Making runtime calculations for WL's more visible and addressing small bugs.

Summary of changes and related issue Making WL modifications so that the notebooks run more clearly and do not break. These include:

  1. Time index mismatching for warming levels due to leap days. This is addressed by dropping all leap days before concatenating DataArrays together.
  2. WL visualize was crashing because no WRF models reached 4 degrees warming. Now this is fixed by returning an empty DataArray object if no simulations exist at that warming level. Additionally, the postage stamps generated will populate with a text saying: No simulations reach this degree of warming.
  3. This PR plots a bar plot instead of an image plot for the postage stamps via wl.visualize IF the dimensions of the DataArray are too small (hvplot.image needs a minimum of 2x2 grid, when the data is smaller than that, a bar plot is generated instead).

What is not fixed (yet):

image

Relevant motivation and context Want to implement minor upgrades for WL calculations so that it becomes more user-friendly.

Type of change

How Has This Been Tested?

Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Tested in warming_levels_approach.ipynb.

Checklist:

claalmve commented 4 months ago

This PR does modify data_load.py slightly. It adds an "intensive" if/else clause in load, which will print an extra statement after the ProgressBar if intensive is True. I added this because the Dask ProgressBars begin slowly and accelerate over time for large datasets or large compute, due to Dask needing to create the computation graph (slow) and then finding ways to parallelize the compute (fast). So, the ProgressBars by themselves may seem seem misleading since the first 1% may take 3 min but the last 1% may take half a second.

claalmve commented 4 months ago

Things to note about this PR:

  1. PerformanceWarnings appear that didn't before. I traced it down to warming.py calling retrieve, meaning that these messages come up from Select. I am not sure why they do pop up for warming.py and not for Select itself.
  2. wl.visualize() will break because of a ValueError when trying to plot a DataArray: ValueError: cannot convert float NaN to integer. I have tried debugging this, and gwl_snapshots all have valid, non-NaN data before plotting, so I am not sure why there is an error being thrown.

If anyone has thoughts or suggestions about these bugs/interesting findings, please feel free to change anything in the code or let me know what you think!

Tianchi-Liu commented 4 months ago

Seems this is already intended, but not reflected in load yet: per PR 346, 347, and 348, load shall be switched back to its original form.

Tianchi-Liu commented 4 months ago

warming_levels.ipynb is giving an error:

Screenshot 2024-05-12 151949

The code is having trouble with warming level 3.0 which some simulations don’t reach.

Full trace:

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
File /srv/conda/envs/notebook/lib/python3.9/site-packages/pandas/core/indexes/base.py:3800, in Index.get_loc(self, key, method, tolerance)
   3799try:
-> 3800return self._engine.get_loc(casted_key)
   3801exceptKeyErroras err:

File /srv/conda/envs/notebook/lib/python3.9/site-packages/pandas/_libs/index.pyx:138, in pandas._libs.index.IndexEngine.get_loc()

File /srv/conda/envs/notebook/lib/python3.9/site-packages/pandas/_libs/index.pyx:165, in pandas._libs.index.IndexEngine.get_loc()

File pandas/_libs/hashtable_class_helper.pxi:5745, in pandas._libs.hashtable.PyObjectHashTable.get_item()

File pandas/_libs/hashtable_class_helper.pxi:5753, in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: False

The above exception was the direct cause of the following exception:

KeyError                                  Traceback (most recent call last)
File <timed eval>:1

File ~/climakitae/climakitae/explore/warming.py:116, in WarmingLevels.calculate(self)
    112 self.gwl_snapshots = {}
    113for levelin tqdm(self.warming_levels, desc="Computing each warming level"):
    114     # Assign warming slices to dask computation graph
    115     warm_slice = load(
--> 116         self.find_warming_slice(level, self.gwl_times)  # , intensive=True
    117     )
    118     # Dropping simulations that only have NaNs
    119     warm_slice = warm_slice.dropna(dim="all_sims", how="all")

File ~/climakitae/climakitae/explore/warming.py:85, in WarmingLevels.find_warming_slice(self, level, gwl_times)
     82 warming_data = warming_data.assign_attrs(window=self.wl_params.window)
     84 # Cleaning data
---> 85 warming_data = clean_warm_data(warming_data)
     86 # Relabeling `all_sims` dimension
     87 new_warm_data = warming_data.drop("all_sims")

File ~/climakitae/climakitae/explore/warming.py:176, in clean_warm_data(warm_data)
    169 """
    170 Cleaning the warming levels data in 3 parts:
    171   1. Removing simulations where this warming level is not crossed. (centered_year)
    172   2. Removing timestamps at the end to account for leap years (time)
    173   3. Removing simulations that go past 2100 for its warming level window (all_sims)
    174 """
    175 # Cleaning #1
--> 176 warm_data = warm_data.sel(all_sims=~warm_data.centered_year.isnull())
    178 # Cleaning #2
    179 warm_data = warm_data.isel(
    180     time=slice(0, len(warm_data.time) - 1)
    181 )  # -1 is just a placeholder for 30 year window, this could be more specific.

File /srv/conda/envs/notebook/lib/python3.9/site-packages/xarray/core/dataarray.py:1420, in DataArray.sel(self, indexers, method, tolerance, drop, **indexers_kwargs)
   1310def sel(
   1311     self: T_DataArray,
   1312     indexers: Mapping[Any, Any] =None,
   (...)
   1316     **indexers_kwargs: Any,
   1317 ) -> T_DataArray:
   1318     """Return a new DataArray whose data is given by selecting index
   1319     labels along the specified dimension(s).
   1320   (...)
   1418     Dimensions without coordinates: points
   1419     """
-> 1420     ds = self._to_temp_dataset().sel(
   1421         indexers=indexers,
   1422         drop=drop,
   1423         method=method,
   1424         tolerance=tolerance,
   1425         **indexers_kwargs,
   1426     )
   1427return self._from_temp_dataset(ds)

File /srv/conda/envs/notebook/lib/python3.9/site-packages/xarray/core/dataset.py:2533, in Dataset.sel(self, indexers, method, tolerance, drop, **indexers_kwargs)
   2472 """Returns a new dataset with each array indexed by tick labels
   2473 along the specified dimension(s).
   2474   (...)
   2530 DataArray.sel
   2531 """
   2532 indexers = either_dict_or_kwargs(indexers, indexers_kwargs, "sel")
-> 2533 query_results = map_index_queries(
   2534     self, indexers=indexers, method=method, tolerance=tolerance
   2535 )
   2537if drop:
   2538     no_scalar_variables = {}

File /srv/conda/envs/notebook/lib/python3.9/site-packages/xarray/core/indexing.py:183, in map_index_queries(obj, indexers, method, tolerance, **indexers_kwargs)
    181         results.append(IndexSelResult(labels))
    182else:
--> 183         results.append(index.sel(labels, **options))  # type: ignore[call-arg]
    185 merged = merge_sel_results(results)
    187 # drop dimension coordinates found in dimension indexers
    188 # (also drop multi-index if any)
    189 # (.sel() already ensures alignment)

File /srv/conda/envs/notebook/lib/python3.9/site-packages/xarray/core/indexes.py:850, in PandasMultiIndex.sel(self, labels, method, tolerance)
    848if label_array.ndim == 0:
    849     label_value = as_scalar(label_array)
--> 850     indexer, new_index = self.index.get_loc_level(label_value, level=0)
    851     scalar_coord_values[self.index.names[0]] = label_value
    852elif label_array.dtype.kind == "b":

File /srv/conda/envs/notebook/lib/python3.9/site-packages/pandas/core/indexes/multi.py:3018, in MultiIndex.get_loc_level(self, key, level, drop_level)
   3015else:
   3016     level = [self._get_level_number(lev)for levin level]
-> 3018 loc, mi = self._get_loc_level(key, level=level)
   3019ifnot drop_level:
   3020if lib.is_integer(loc):

File /srv/conda/envs/notebook/lib/python3.9/site-packages/pandas/core/indexes/multi.py:3159, in MultiIndex._get_loc_level(self, key, level)
   3157return indexer, maybe_mi_droplevels(indexer, ilevels)
   3158else:
-> 3159     indexer = self._get_level_indexer(key, level=level)
   3160if (
   3161         isinstance(key, str)
   3162and self.levels[level]._supports_partial_string_indexing
   3163     ):
   3164         # check to see if we did an exact lookup vs sliced
   3165         check = self.levels[level].get_loc(key)

File /srv/conda/envs/notebook/lib/python3.9/site-packages/pandas/core/indexes/multi.py:3262, in MultiIndex._get_level_indexer(self, key, level, indexer)
   3258return slice(i, j, step)
   3260else:
-> 3262     idx = self._get_loc_single_level_index(level_index, key)
   3264if level > 0or self._lexsort_depth == 0:
   3265         # Desired level is not sorted
   3266if isinstance(idx, slice):
   3267             # test_get_loc_partial_timestamp_multiindex

File /srv/conda/envs/notebook/lib/python3.9/site-packages/pandas/core/indexes/multi.py:2848, in MultiIndex._get_loc_single_level_index(self, level_index, key)
   2846return -1
   2847else:
-> 2848return level_index.get_loc(key)

File /srv/conda/envs/notebook/lib/python3.9/site-packages/pandas/core/indexes/base.py:3802, in Index.get_loc(self, key, method, tolerance)
   3800return self._engine.get_loc(casted_key)
   3801exceptKeyErroras err:
-> 3802raiseKeyError(key)fromerr   3803exceptTypeError:
   3804     # If we have a listlike key, _check_indexing_error will raise
   3805     #  InvalidIndexError. Otherwise we fall through and re-raise
   3806     #  the TypeError.
   3807     self._check_indexing_error(key)

KeyError: False
claalmve commented 3 months ago

Seems this is already intended, but not reflected in load yet: per PR 346, 347, and 348, load shall be switched back to its original form.

Yes, it's still in this PR for ease-of-use, if/when this PR gets approved, I will remove that code before merging the PR into main.

vicford commented 3 months ago

Errored out at the wl.calculate() cell after 40 min of loading time (successful?):

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
File <timed eval>:1

File ~/src/climakitae/climakitae/explore/warming.py:138, in WarmingLevels.calculate(self)
    131 self.cmap = _get_cmap(self.wl_params)
    132 self.wl_viz = WarmingLevelVisualize(
    133     gwl_snapshots=self.gwl_snapshots,
    134     wl_params=self.wl_params,
    135     cmap=self.cmap,
    136     warming_levels=self.warming_levels,
    137 )
--> 138 self.wl_viz.compute_stamps()

File ~/src/climakitae/climakitae/explore/warming.py:412, in WarmingLevelVisualize.compute_stamps(self)
    411 def compute_stamps(self):
--> 412     self.main_stamps = GCM_PostageStamps_MAIN_compute(self)
    413     self.stats_stamps = GCM_PostageStamps_STATS_compute(self)

File ~/src/climakitae/climakitae/explore/warming.py:714, in GCM_PostageStamps_MAIN_compute(wl_viz)
    712 # Splitting up logic to plot images or bar for postage stamps depending on if there exist more/less than 2x2 gridcells
    713 plot_type = ""
--> 714 any_single_dims = _check_single_spatial_dims(all_plot_data)
    715 if not any_single_dims:
    716     all_plots = all_plot_data.hvplot.image(**plot_image_kwargs).cols(4)

File ~/src/climakitae/climakitae/explore/warming.py:308, in _check_single_spatial_dims(da)
    304 """
    305 This checks needs to happen to determine whether or not the plots in postage stamps should be image plots or bar plots, depending on whether or not one of the spatial dimensions is <= a length of 1.
    306 """
    307 if set(["lat", "lon"]).issubset(set(da.dims)):
--> 308     if len(da.x) <= 1 or len(da.y) <= 1:
    309         return True
    310 elif set(["x", "y"]).issubset(set(da.dims)):

File /srv/conda/envs/notebook/lib/python3.9/site-packages/xarray/core/common.py:256, in AttrAccessMixin.__getattr__(self, name)
    254         with suppress(KeyError):
    255             return source[name]
--> 256 raise AttributeError(
    257     f"{type(self).__name__!r} object has no attribute {name!r}"
    258 )

AttributeError: 'DataArray' object has no attribute 'x'
Tianchi-Liu commented 3 months ago

Some simulations reach 4.0 (and even 2.0/3.0?) so close to the end of the century, that there aren’t enough years of data to fill the 30-yr (and potentially larger) windows. The gwl_snapshots would be biased. We imposed a lot of limitations to the agnostic tools to deal with a similar issue, so I’m not sure if that would work here.

vicford commented 3 months ago

Getting the following error in warming_levels.ipynb after a successful wl.calculate() step:

---------------------------------------------------------------------------
UnboundLocalError                         Traceback (most recent call last)
File <timed eval>:1

File ~/src/climakitae/climakitae/explore/warming.py:138, in WarmingLevels.calculate(self)
    131 self.cmap = _get_cmap(self.wl_params)
    132 self.wl_viz = WarmingLevelVisualize(
    133     gwl_snapshots=self.gwl_snapshots,
    134     wl_params=self.wl_params,
    135     cmap=self.cmap,
    136     warming_levels=self.warming_levels,
    137 )
--> 138 self.wl_viz.compute_stamps()

File ~/src/climakitae/climakitae/explore/warming.py:413, in WarmingLevelVisualize.compute_stamps(self)
    411 def compute_stamps(self):
    412     self.main_stamps = GCM_PostageStamps_MAIN_compute(self)
--> 413     self.stats_stamps = GCM_PostageStamps_STATS_compute(self)

File ~/src/climakitae/climakitae/explore/warming.py:883, in GCM_PostageStamps_STATS_compute(wl_viz)
    875     all_plots = plot_list[0] + plot_list[1] + plot_list[2]
    877 all_plots.opts(
    878     title=wl_viz.wl_params.variable
    879     + ": for "
    880     + str(warmlevel)
    881     + "°C Warming Across Models"
    882 )  # Add title
--> 883 if plot_type == "image":
    884     warm_level_dict[warmlevel] = all_plots.cols(1)
    885 elif plot_type == "bar":

UnboundLocalError: local variable 'plot_type' referenced before assignment
Tianchi-Liu commented 3 months ago

Good working around hvplot limits. The only major issue as you noted arises when a simulation has data for multiple scenarios.

Ignoring the scenarios doesn’t work well for LOCA indeed. Not for WRF CESM2 either, it has data for all 3 scenarios when the default 3km resolution is changed to 45km, but only one is visible as a bar. Screenshot 2024-06-10 101921

Tianchi-Liu commented 3 months ago

Some relatively minor things:

Changing to 45km also causes wl.visualize() to warn:

/srv/conda/envs/notebook/lib/python3.9/site-packages/holoviews/core/data/pandas.py:221: FutureWarning: In a future version of pandas, a length 1 tuple will be returned when iterating over a groupby with a grouper equal to a list of length 1. Don't supply a list with a single grouper to avoid this warning.
  data = [(k, group_type(v, **group_kwargs)) for k, v in
Tianchi-Liu commented 3 months ago

Some extra items show up on the x-axis after clicking around different warming levels and tabs, although the data look right. Screenshot 2024-06-10 101949

Tianchi-Liu commented 3 months ago

Related, to make simulation names interpretable, making the bars horizontal is the easiest solution for static plots.

Tianchi-Liu commented 3 months ago

A final reminder to black format before merging.

vicford commented 3 months ago

Going to reference the plots from this PR https://github.com/cal-adapt/cae-notebooks/pull/103 for reference here.

Potentially need to rethink this approach on visualizing small locations as bar plots, especially for the LOCA2 data.

claalmve commented 3 months ago

Going to reference the plots from this PR cal-adapt/cae-notebooks#103 for reference here.

Potentially need to rethink this approach on visualizing small locations as bar plots, especially for the LOCA2 data.

Yes definitely. I think I will still merge this PR in for now (since the visualizations will just break on these small location inputs in the current main), and once we split up GUIs/calculations for WLs, come back to these visualization issues.