Closed claalmve closed 3 months ago
This PR does modify data_load.py
slightly. It adds an "intensive" if/else clause in load
, which will print an extra statement after the ProgressBar if intensive
is True. I added this because the Dask ProgressBars begin slowly and accelerate over time for large datasets or large compute, due to Dask needing to create the computation graph (slow) and then finding ways to parallelize the compute (fast). So, the ProgressBars by themselves may seem seem misleading since the first 1% may take 3 min but the last 1% may take half a second.
Things to note about this PR:
warming.py
calling retrieve
, meaning that these messages come up from Select. I am not sure why they do pop up for warming.py
and not for Select itself.wl.visualize()
will break because of a ValueError when trying to plot a DataArray: ValueError: cannot convert float NaN to integer
. I have tried debugging this, and gwl_snapshots
all have valid, non-NaN data before plotting, so I am not sure why there is an error being thrown.If anyone has thoughts or suggestions about these bugs/interesting findings, please feel free to change anything in the code or let me know what you think!
Seems this is already intended, but not reflected in load
yet: per PR 346, 347, and 348, load
shall be switched back to its original form.
warming_levels.ipynb
is giving an error:
The code is having trouble with warming level 3.0 which some simulations don’t reach.
Full trace:
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
File /srv/conda/envs/notebook/lib/python3.9/site-packages/pandas/core/indexes/base.py:3800, in Index.get_loc(self, key, method, tolerance)
3799try:
-> 3800return self._engine.get_loc(casted_key)
3801exceptKeyErroras err:
File /srv/conda/envs/notebook/lib/python3.9/site-packages/pandas/_libs/index.pyx:138, in pandas._libs.index.IndexEngine.get_loc()
File /srv/conda/envs/notebook/lib/python3.9/site-packages/pandas/_libs/index.pyx:165, in pandas._libs.index.IndexEngine.get_loc()
File pandas/_libs/hashtable_class_helper.pxi:5745, in pandas._libs.hashtable.PyObjectHashTable.get_item()
File pandas/_libs/hashtable_class_helper.pxi:5753, in pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: False
The above exception was the direct cause of the following exception:
KeyError Traceback (most recent call last)
File <timed eval>:1
File ~/climakitae/climakitae/explore/warming.py:116, in WarmingLevels.calculate(self)
112 self.gwl_snapshots = {}
113for levelin tqdm(self.warming_levels, desc="Computing each warming level"):
114 # Assign warming slices to dask computation graph
115 warm_slice = load(
--> 116 self.find_warming_slice(level, self.gwl_times) # , intensive=True
117 )
118 # Dropping simulations that only have NaNs
119 warm_slice = warm_slice.dropna(dim="all_sims", how="all")
File ~/climakitae/climakitae/explore/warming.py:85, in WarmingLevels.find_warming_slice(self, level, gwl_times)
82 warming_data = warming_data.assign_attrs(window=self.wl_params.window)
84 # Cleaning data
---> 85 warming_data = clean_warm_data(warming_data)
86 # Relabeling `all_sims` dimension
87 new_warm_data = warming_data.drop("all_sims")
File ~/climakitae/climakitae/explore/warming.py:176, in clean_warm_data(warm_data)
169 """
170 Cleaning the warming levels data in 3 parts:
171 1. Removing simulations where this warming level is not crossed. (centered_year)
172 2. Removing timestamps at the end to account for leap years (time)
173 3. Removing simulations that go past 2100 for its warming level window (all_sims)
174 """
175 # Cleaning #1
--> 176 warm_data = warm_data.sel(all_sims=~warm_data.centered_year.isnull())
178 # Cleaning #2
179 warm_data = warm_data.isel(
180 time=slice(0, len(warm_data.time) - 1)
181 ) # -1 is just a placeholder for 30 year window, this could be more specific.
File /srv/conda/envs/notebook/lib/python3.9/site-packages/xarray/core/dataarray.py:1420, in DataArray.sel(self, indexers, method, tolerance, drop, **indexers_kwargs)
1310def sel(
1311 self: T_DataArray,
1312 indexers: Mapping[Any, Any] =None,
(...)
1316 **indexers_kwargs: Any,
1317 ) -> T_DataArray:
1318 """Return a new DataArray whose data is given by selecting index
1319 labels along the specified dimension(s).
1320 (...)
1418 Dimensions without coordinates: points
1419 """
-> 1420 ds = self._to_temp_dataset().sel(
1421 indexers=indexers,
1422 drop=drop,
1423 method=method,
1424 tolerance=tolerance,
1425 **indexers_kwargs,
1426 )
1427return self._from_temp_dataset(ds)
File /srv/conda/envs/notebook/lib/python3.9/site-packages/xarray/core/dataset.py:2533, in Dataset.sel(self, indexers, method, tolerance, drop, **indexers_kwargs)
2472 """Returns a new dataset with each array indexed by tick labels
2473 along the specified dimension(s).
2474 (...)
2530 DataArray.sel
2531 """
2532 indexers = either_dict_or_kwargs(indexers, indexers_kwargs, "sel")
-> 2533 query_results = map_index_queries(
2534 self, indexers=indexers, method=method, tolerance=tolerance
2535 )
2537if drop:
2538 no_scalar_variables = {}
File /srv/conda/envs/notebook/lib/python3.9/site-packages/xarray/core/indexing.py:183, in map_index_queries(obj, indexers, method, tolerance, **indexers_kwargs)
181 results.append(IndexSelResult(labels))
182else:
--> 183 results.append(index.sel(labels, **options)) # type: ignore[call-arg]
185 merged = merge_sel_results(results)
187 # drop dimension coordinates found in dimension indexers
188 # (also drop multi-index if any)
189 # (.sel() already ensures alignment)
File /srv/conda/envs/notebook/lib/python3.9/site-packages/xarray/core/indexes.py:850, in PandasMultiIndex.sel(self, labels, method, tolerance)
848if label_array.ndim == 0:
849 label_value = as_scalar(label_array)
--> 850 indexer, new_index = self.index.get_loc_level(label_value, level=0)
851 scalar_coord_values[self.index.names[0]] = label_value
852elif label_array.dtype.kind == "b":
File /srv/conda/envs/notebook/lib/python3.9/site-packages/pandas/core/indexes/multi.py:3018, in MultiIndex.get_loc_level(self, key, level, drop_level)
3015else:
3016 level = [self._get_level_number(lev)for levin level]
-> 3018 loc, mi = self._get_loc_level(key, level=level)
3019ifnot drop_level:
3020if lib.is_integer(loc):
File /srv/conda/envs/notebook/lib/python3.9/site-packages/pandas/core/indexes/multi.py:3159, in MultiIndex._get_loc_level(self, key, level)
3157return indexer, maybe_mi_droplevels(indexer, ilevels)
3158else:
-> 3159 indexer = self._get_level_indexer(key, level=level)
3160if (
3161 isinstance(key, str)
3162and self.levels[level]._supports_partial_string_indexing
3163 ):
3164 # check to see if we did an exact lookup vs sliced
3165 check = self.levels[level].get_loc(key)
File /srv/conda/envs/notebook/lib/python3.9/site-packages/pandas/core/indexes/multi.py:3262, in MultiIndex._get_level_indexer(self, key, level, indexer)
3258return slice(i, j, step)
3260else:
-> 3262 idx = self._get_loc_single_level_index(level_index, key)
3264if level > 0or self._lexsort_depth == 0:
3265 # Desired level is not sorted
3266if isinstance(idx, slice):
3267 # test_get_loc_partial_timestamp_multiindex
File /srv/conda/envs/notebook/lib/python3.9/site-packages/pandas/core/indexes/multi.py:2848, in MultiIndex._get_loc_single_level_index(self, level_index, key)
2846return -1
2847else:
-> 2848return level_index.get_loc(key)
File /srv/conda/envs/notebook/lib/python3.9/site-packages/pandas/core/indexes/base.py:3802, in Index.get_loc(self, key, method, tolerance)
3800return self._engine.get_loc(casted_key)
3801exceptKeyErroras err:
-> 3802raiseKeyError(key)fromerr 3803exceptTypeError:
3804 # If we have a listlike key, _check_indexing_error will raise
3805 # InvalidIndexError. Otherwise we fall through and re-raise
3806 # the TypeError.
3807 self._check_indexing_error(key)
KeyError: False
Seems this is already intended, but not reflected in
load
yet: per PR 346, 347, and 348,load
shall be switched back to its original form.
Yes, it's still in this PR for ease-of-use, if/when this PR gets approved, I will remove that code before merging the PR into main.
Errored out at the wl.calculate()
cell after 40 min of loading time (successful?):
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
File <timed eval>:1
File ~/src/climakitae/climakitae/explore/warming.py:138, in WarmingLevels.calculate(self)
131 self.cmap = _get_cmap(self.wl_params)
132 self.wl_viz = WarmingLevelVisualize(
133 gwl_snapshots=self.gwl_snapshots,
134 wl_params=self.wl_params,
135 cmap=self.cmap,
136 warming_levels=self.warming_levels,
137 )
--> 138 self.wl_viz.compute_stamps()
File ~/src/climakitae/climakitae/explore/warming.py:412, in WarmingLevelVisualize.compute_stamps(self)
411 def compute_stamps(self):
--> 412 self.main_stamps = GCM_PostageStamps_MAIN_compute(self)
413 self.stats_stamps = GCM_PostageStamps_STATS_compute(self)
File ~/src/climakitae/climakitae/explore/warming.py:714, in GCM_PostageStamps_MAIN_compute(wl_viz)
712 # Splitting up logic to plot images or bar for postage stamps depending on if there exist more/less than 2x2 gridcells
713 plot_type = ""
--> 714 any_single_dims = _check_single_spatial_dims(all_plot_data)
715 if not any_single_dims:
716 all_plots = all_plot_data.hvplot.image(**plot_image_kwargs).cols(4)
File ~/src/climakitae/climakitae/explore/warming.py:308, in _check_single_spatial_dims(da)
304 """
305 This checks needs to happen to determine whether or not the plots in postage stamps should be image plots or bar plots, depending on whether or not one of the spatial dimensions is <= a length of 1.
306 """
307 if set(["lat", "lon"]).issubset(set(da.dims)):
--> 308 if len(da.x) <= 1 or len(da.y) <= 1:
309 return True
310 elif set(["x", "y"]).issubset(set(da.dims)):
File /srv/conda/envs/notebook/lib/python3.9/site-packages/xarray/core/common.py:256, in AttrAccessMixin.__getattr__(self, name)
254 with suppress(KeyError):
255 return source[name]
--> 256 raise AttributeError(
257 f"{type(self).__name__!r} object has no attribute {name!r}"
258 )
AttributeError: 'DataArray' object has no attribute 'x'
Some simulations reach 4.0 (and even 2.0/3.0?) so close to the end of the century, that there aren’t enough years of data to fill the 30-yr (and potentially larger) windows. The gwl_snapshots would be biased. We imposed a lot of limitations to the agnostic tools to deal with a similar issue, so I’m not sure if that would work here.
Getting the following error in warming_levels.ipynb after a successful wl.calculate() step:
---------------------------------------------------------------------------
UnboundLocalError Traceback (most recent call last)
File <timed eval>:1
File ~/src/climakitae/climakitae/explore/warming.py:138, in WarmingLevels.calculate(self)
131 self.cmap = _get_cmap(self.wl_params)
132 self.wl_viz = WarmingLevelVisualize(
133 gwl_snapshots=self.gwl_snapshots,
134 wl_params=self.wl_params,
135 cmap=self.cmap,
136 warming_levels=self.warming_levels,
137 )
--> 138 self.wl_viz.compute_stamps()
File ~/src/climakitae/climakitae/explore/warming.py:413, in WarmingLevelVisualize.compute_stamps(self)
411 def compute_stamps(self):
412 self.main_stamps = GCM_PostageStamps_MAIN_compute(self)
--> 413 self.stats_stamps = GCM_PostageStamps_STATS_compute(self)
File ~/src/climakitae/climakitae/explore/warming.py:883, in GCM_PostageStamps_STATS_compute(wl_viz)
875 all_plots = plot_list[0] + plot_list[1] + plot_list[2]
877 all_plots.opts(
878 title=wl_viz.wl_params.variable
879 + ": for "
880 + str(warmlevel)
881 + "°C Warming Across Models"
882 ) # Add title
--> 883 if plot_type == "image":
884 warm_level_dict[warmlevel] = all_plots.cols(1)
885 elif plot_type == "bar":
UnboundLocalError: local variable 'plot_type' referenced before assignment
Good working around hvplot limits. The only major issue as you noted arises when a simulation has data for multiple scenarios.
Ignoring the scenarios doesn’t work well for LOCA indeed. Not for WRF CESM2 either, it has data for all 3 scenarios when the default 3km resolution is changed to 45km, but only one is visible as a bar.
Some relatively minor things:
Changing to 45km also causes wl.visualize() to warn:
/srv/conda/envs/notebook/lib/python3.9/site-packages/holoviews/core/data/pandas.py:221: FutureWarning: In a future version of pandas, a length 1 tuple will be returned when iterating over a groupby with a grouper equal to a list of length 1. Don't supply a list with a single grouper to avoid this warning.
data = [(k, group_type(v, **group_kwargs)) for k, v in
Some extra items show up on the x-axis after clicking around different warming levels and tabs, although the data look right.
Related, to make simulation names interpretable, making the bars horizontal is the easiest solution for static plots.
A final reminder to black format before merging.
Going to reference the plots from this PR https://github.com/cal-adapt/cae-notebooks/pull/103 for reference here.
Potentially need to rethink this approach on visualizing small locations as bar plots, especially for the LOCA2 data.
Going to reference the plots from this PR cal-adapt/cae-notebooks#103 for reference here.
Potentially need to rethink this approach on visualizing small locations as bar plots, especially for the LOCA2 data.
Yes definitely. I think I will still merge this PR in for now (since the visualizations will just break on these small location inputs in the current main), and once we split up GUIs/calculations for WLs, come back to these visualization issues.
Description of PR
Making runtime calculations for WL's more visible and addressing small bugs.
Summary of changes and related issue Making WL modifications so that the notebooks run more clearly and do not break. These include:
What is not fixed (yet):
warming_levels.ipynb
is not totally functional. A lot of empty plots and overlaid text.wl.visualize()
, the bar plot DOES NOT put simulation names in an interpretable fashion. I could not figure out how to get hvplot to work with me on this. UPDATE May 28: STATS plots generate, but they will also face the same problem as below with the overlaying simulation names and extra side panels.warming_levels_approach.ipynb
due to the selected data being too small to compute .hvplot.images on inGCM_PostageStamps_STATS_compute
.Relevant motivation and context Want to implement minor upgrades for WL calculations so that it becomes more user-friendly.
Type of change
How Has This Been Tested?
Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Tested in
warming_levels_approach.ipynb
.Checklist: