ajdawson / eofs

EOF analysis in Python
http://ajdawson.github.io/eofs/
GNU General Public License v3.0
200 stars 60 forks source link

issue when using xarray + dask #115

Closed navidcy closed 2 years ago

navidcy commented 5 years ago

Hi there,

I'm trying to use eofs package. It seems to work OK when I use numpy arrays or when I only use xarray. But I can't get around myself using it with xarray+dask.

I've reduced my dataset into something very small.

Here are 3 example notebooks...

  1. eofs with numpy
  2. eofs with xarray
  3. eofs with xarray+dask

@ScottWales, am I doing something wrong here? I also tried chunking .chunk('time'=1) but I still had the same issue...

ScottWales commented 5 years ago

The dask svd solver requires its input chunking to be 'long and thin' - it may have chunks in the time dimension, but not in the spatial dimensions. In your case your data has multiple chunks on both latitude and longitude due to your spatial binning.

navidcy commented 5 years ago

@ScottWales problems persist when I do that... See this notebook.

Perhaps this is not an issue of the eofs but a dask-related issue? Or possibly issue with the way I call dask?

ajdawson commented 2 years ago

I'm closing this one as it is not clear where the problem lies, and has been a long time since the last update.

lihansunbai commented 1 year ago

Hi @navidcy I had a similiar problem with you. After read https://docs.dask.org/en/stable/generated/dask.array.linalg.svd.html#dask.array.linalg.svd and the exceptions from dask error information, this problem may be triggered by the shapes of dask.chunks. So, I implemented the dask.array.rechunk() function to force the dask.array.shapes or chunks to 'tall-and-skinny', like: dataNoMissing = dataNoMissing.rechunk(dataNoMissing.size), before dask.array.linalg.svd(dataNoMissing, coerce_signs=False). It can work with my problem. And I hope it can work with you, too.