Closed jhamman closed 4 years ago
I'm not 100% sure this was here before the move to intake-esm so I'm wondering if we're combining datasets in a sub-optimal way.
In this particular case, we are not merging/concatenating the zarr stores. Each zarr store ends up in its own xarray dataset:
After further investigations, I can confirm that intake-esm is not the culprit here. I am able to reproduce this behavior without intake-esm.
import fsspec
import xarray as xr
options = dict(anon=True)
ds_20C = xr.open_zarr(fsspec.get_mapper('s3://ncar-cesm-lens/atm/daily/cesmLE-20C-TREFHT.zarr', **options), consolidated=True)
ds_RCP85 = xr.open_zarr(fsspec.get_mapper('s3://ncar-cesm-lens/atm/daily/cesmLE-RCP85-TREFHT.zarr', **options), consolidated=True)
Dependency versions
numpy 1.17.3
fsspec 0.6.1
xarray 0.14.1
intake 0.5.3
intake_esm 2019.10.15.post92
dask 2.9.0
Here's the notebook used to reproduce this behavior.
Unfortunately, I can't really tell whether this behavior has always been there and/or it is caused by recent changes in dask/xarray.
I believe each chunk (at least for TREFHT) contains the entire 288x192 lat/lon range, so does this line mean each chunk ends up getting read (288/20)*(192/20)=150 times, or even 20x20=400 times?
winter_trends = linear_trend( winter_seasons.chunk({"lat": 20, "lon": 20, "time": -1}) ).load() * len(winter_seasons.time)
Closing this issue for the time being...My current speculation is that it has to do with xarray's apply_ufunc
, but I was unable to determine the exact root cause. I may re-open it in the future :)
Could use da.polyfit
now and see if the same behaviour persists.
Could use da.polyfit now and see if the same behaviour persists.
👍 I had forgotten about da.polyfit
. I will look into it tomorrow.
I'm noticing some odd behavior in the example notebook with regard to dask and some transpose logic. I'm not 100% sure this was here before the move to intake-esm so I'm wondering if we're combining datasets in a sub-optimal way. The offending cell is below. I'm about to give a tutorial so I can't dig deeper right now but I'm putting this up so we can come back to it.