Closed loliverhennigh closed 11 months ago
You're right, the current chunking scheme is quite big. It would definitely be an improvement to split each chunk per level, as you have suggested. We can definitely prioritize this improvement for working with the CO version of the data (our current focus has been towards Phase 2). I'll leave this issue open to track the work.
Hello @loliverhennigh , I'm glad to inform you that I had created a pull request (#31) that addresses the current issue(https://github.com/google-research/arco-era5/issues/28 issue) which you opened. The changes in the pull request effectively solve the problem you reported.
Could you please review the pull request and, if everything looks good to you, can you marked as resolved this issue? As @alxmrs already merged this PR into main
branch
Thank you for bringing this issue to our attention. If you have any further questions or concerns, feel free to let me know.
@DarshanSP19: once #49 lands, can we mark this issue as fixed?
Hey, not a cloud expert but wondering the rational for using the chunking you have. I see that the zarr files have rather large chunk sizes. For example, the model level variables have chunk size dask.array<chunksize=(48, 137, 410240). This works out to be about 10 gigs. My understanding was that a good chunk size for object storage is on the order of MBs. Wouldn't it make sense to have chunking (1, 1, 410240) for example?