Open AtiehAlipour-NOAA opened 3 weeks ago
It's on our "future" list to look into performance and chunking, so this is great.
The challenge, IIUC, is that to rechunk the data, you need to make a copy of it -- and that can be pretty expensive.
Potentially, the goal could be for STOFS (and other OFSs!) to be re-chunked before being uploaded to the NODD (or even in the original output).
The challenge with that is that an optimum chunking strategy is different depending on the use case, so there may not be a consensus on one "best" way to chunk the data.
Also -- for an unstructured grid, the ordering of the nodes can have a big impact -- does the CERA code reorder the nodes, in addition to re-chunking?
I agree that copying the file might not be a good idea, but I thought if working with STOFS data was slow, that might be an idea. I also heard in that meeting that they transpose the dimension files before chunking the data, but I couldn't figure that out from the code. I do not think they do reordering of the nodes. We might find some relevant material in the JRC code: https://github.com/asascience-open/xarray-subset-grid/issues/19
This is also a relevant library that @SorooshMani-NOAA has shared: https://medium.com/pangeo/rechunker-the-missing-link-for-chunked-array-analytics-5b2359e9dc11
CERA uses a code to chunk STOFS .nc files before visualization, which makes it more efficient. Perhaps we can implement the same code before subsetting STOFS data. The code is in a private repository, but I have access to it and the permission to exclusively share it with the STOFS Subsetting Tool development team.