Open sfinkens opened 3 years ago
@sfinkens thanks for reporting this. To a certain degree I know about this bug. It is a little sloppy and I wanted to document some limitations of how some calculations are implemented more thoroughly for quite some time.
That said, in this case the culprit is how border values are treated in map_overlap
in the Dask case. Due to the fact that you use arange
in the example the difference between the two calculations becomes very obvious. With random values the difference gets smaller. Never the less this is a bug.
I have been thinking about how to improve the accuracy of the calculations with regard to how borders are treated every now and then but I haven't come up with good ideas yet (which do make this significantly slower). I have a workaround in mind at least for the functions in the statistics module but haven't implemented and tested it yet. How quick would you need a solution?
Another question I have: How would you feel about an additional dependency like Numba from a user point of view?
Thanks for the feedback @BENR0 ! There's no hurry, we're still in a testing phase and evaluating the best solution for our application. That is computing block-wise variograms like so:
data_array.coarsen(lon=boxsize, lat=boxsize).reduce(textory.statistics.variogram, lag=lag)
We're already using numba in that project, so this would be fine.
Describe the bug The
textory.statistics.variogram
method yields different results for numpy and dask arrays.To Reproduce
Expected behavior The results to be identical :)
Environment Info: