coiled / examples

Examples using Dask and Coiled
17 stars 3 forks source link

Add rechunking example #47

Closed jrbourbeau closed 4 months ago

jrbourbeau commented 4 months ago

This example reads in 1 TB worth of NVM data, rechunks it to be optimized for time selections, and then writes the rechunked dataset to S3 (in oss-scratch-space in us-east-1).

cc @mrocklin. Happy to keep iterating

review-notebook-app[bot] commented 4 months ago

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

mrocklin commented 4 months ago

Playing now. Neat. Some thoughts!

  1. It might make sense to arrange the data to make spatial access cheap.

    I think that the most common situation I've heard from people is "My satellite pumps out one file every day/hour, so it's organized by time, but I want it organized spatially, so that I can pick out a timeseries for a lat/lon pair really easily.

  2. Maybe at the end we can open up the data with just zarr/xarray without Dask, and show that it's really cheap to get these timeseries, for example from a web application (what they seem to all want to do). I'm actually a little curious about sub-chunk access times. It may be that we want to store the zarr array with far finer chunking than Dask would want so that we're not accessing a bunch of neighboring lat/lon pairs at once. Maybe Xarray does this by default, but maybe not. My hope is that we could show ~100ms access times for little tiny timeseries'.

  3. Thoughts on combining this into the geospatial notebook? I can imagine that in many cases it'll be nice to go from one example to the next, and I wouldn't mind consolidating example notebooks a little.

mrocklin commented 4 months ago

Oh, I guess the rechunking isn't very impressive though, because it's mostly chunked in this way already ...

Maybe we keep with time-optimized then but maybe some of the other feedback still holds?

jrbourbeau commented 4 months ago

@mrocklin you made some changes offline to this notebook -- want to push up those changes here, or to a different PR (whichever is easiest)?

mrocklin commented 4 months ago

I've merged your rechunk example to the xarray example.

jrbourbeau commented 4 months ago

Thanks @mrocklin -- I pushed up one minor update in https://github.com/coiled/examples/pull/49