Closed dcherian closed 7 months ago
@dcherian !!!! . . . this has to be the BEST open source "customer service" in coding history. A great example of what being a bit courageous and making all your (possibly terrible) code and (possibly silly) problems open to the world for all to see can do. What is possibly "a bit sad" is my understand and approach - so I'd really appreciate a chat.
Right now it's Friday night (Tasmania time) and my little family and I are going camping so I likely won't get back to you on this until Monday (Tasmania time).
hehehe I'm obsessed with solving this problem! No rush of course but happy to chat next week.
Some general comments (applicable to latest flox/xarray)
ds.groupby("time.month").mean()
syntax and it will use flox automatically if installed.method
automatically looking at how the groups are distributed across chunks. You should not have to set method
. This will let flox choose what's appropriate to whatever chunking you have..chunk(time=30)
for e.g. but really you should think about rechunking to a frequency (e.g. monthly or two-monthly). We don't have nice syntax for this yet (https://github.com/pydata/xarray/issues/7559) but you can quite easily figure out the appropriate chunk tuple with ds.time.resample(time="2M").count()
. If properly done, flox will then choose either "cohorts" or "blockwise" automatically for you, and save some memory. Here's an example: https://flox.readthedocs.io/en/latest/user-stories/climatology.html#rechunking-data-for-cohortshehehe I'm obsessed with solving this problem!
And for this, @dcherian, many are very grateful!
People likely think I'm obsessed too, but my progress has been slow, despite your excellent documentation. Part / much of this is possibly another issue #13 where xr.open_mfdataset
was reporting an object loaded from netcdf - short datatype
being float32
, which was expected, but when computed it became float64
? I didn't notice this at first and it caused me unexpected memory issues. I'm not suggesting there is any bug in xr.open_mfdataset
more that I don't understand how it works with the specific netcdf - short
variables I'm loading. For now I'm forcing float32
with .astype
.
My open repo here is mainly my personal work notes and so I'm sure it's not easy to understand or follow. I'll apply what you've said above and reframe the problem for you below.
@dcherian - I've summarised things here #17 and will document things in that issue. Would appreciate any comments you have over there in #17 . Thanks.
I saw the link to a flox issue and from a quick browse it looks like you're struggling a bit trying to get it to work well.
Wanna chat?
I wrote flox with the goal of making climatology generation and compositing "just work", so this is a bit sad :(