corteva / rioxarray

geospatial xarray extension powered by rasterio
https://corteva.github.io/rioxarray
Other
528 stars 84 forks source link

BUG: Fix chunk arguments for normalize_chunks #820

Closed phofl closed 3 weeks ago

phofl commented 3 weeks ago

The current specification of the previous chunks is incorrect, it always needs to sum up to the whole axis if given as tuples. What you want is an integer here.

I've recently refactored how normalize_chunks works and this will break things for you. I'll add some compat code for now, but will want to remove this at some point, so fixing it over here

phofl commented 3 weeks ago

I am not sure how and if we can test this unfortunately.

snowman2 commented 3 weeks ago

This was added in #31. I am concerned about modifying code that has been around for so long without some way to test and verify that the behavior works as expected.

phofl commented 3 weeks ago

The previous way of specifying chunks was technically invalid, i.e. if you pass something like

chunks=((1, ), (1, ), (100, )) 
shape = (1, 10, 100)

to any array constructor (da.random.random for example), it will raise. The way this was intended was to pass (1, 1, 100) instead (historically, normalize_chunks treated both inputs equivalent). This PR addresses that.

Re testing: Every test in the test suite that uses chunks=True|"auto" tests this behaviour implicitly, so stuff would break if that would create an incompatible output.

I've put up a PR for Dask that makes both inputs equivalent for the time being and it's tested over there that both notations are equivalent.

snowman2 commented 3 weeks ago

After looking at dask's tests, I agree with your conclusion. Thanks!

phofl commented 3 weeks ago

Great, thanks for reviewing!