UXARRAY / uxarray

Xarray extension for unstructured climate and global weather data analysis and visualization.
https://uxarray.readthedocs.io/
Apache License 2.0
148 stars 31 forks source link

Fix CI Failures #731

Closed philipc2 closed 5 months ago

philipc2 commented 5 months ago

Closes #723

Overview

philipc2 commented 5 months ago

An actual solution, if possible, rather than pinning would be ideal. Let me know if you explored anything that might be related to the actual issue; I am happy to look into this a bit.

I can continue to do some digging into it, the failure is as follows:

../../../miniconda3/envs/uxarray_build/lib/python3.10/site-packages/datashader/data_libraries/dask.py:147: AssertionError

It appears to be something with Datashader and Dask. I'll continue looking into it and try to reproduce it locally.

philipc2 commented 5 months ago

This seems to be because of the user of dask-expr in the recent release for Dataframes, which breaks something with the way that we are caching (and honestly, we should investigate better ways to "cache" as opposed to just keeping a reference in the grid) https://docs.dask.org/en/stable/changelog.html#v2024-3-0

Considering how quick point rasterization is, I don't think we need to cache, so I've removed it and cleaned up the point raster.

image

Any thoughts? I can raise an issue to investigate the proper way to "cache" dask data frames for subsequent runs, which will be relevant to our other viz methods too.

philipc2 commented 5 months ago

Thanks for this! However, changes in this PR seem to be beyond this PR's scope, or it is hard to understand all those changes were necessary for the Dask issue and if they are not changing the code flow significantly. Please clarify

Thanks for the review! I'll add some comments describing the sections of the code. However, these aren't new. The changes made here are, for the time being, remove our "caching" approach for the point rasters. When we use a cached dataframe, attempting to swap the data variable for the points leads to the Dask error that we were getting.

The difference in performance with or without using caching is comparable as seen in the Viz cookbook,.

Additionally, there are probably better ways for caching compared to the way we were doing it, which worked fine for standard arrays, but not so much for Dask.

Let me know what you think, I submitted an issue (#739) that will describe this in detail.