Open yangxianke opened 11 months ago
subset = ds[var].sel(latitude=latitude_slice, longitude=longitude_slices)
path = osp.join(out_dir, f"{var}_{region}.zarr")
mode = 'a' if osp.exists(path) and os.listdir(path) else 'w'
subset.to_zarr(path, mode=mode, compute=True)
I'm trying something similar, but noticing a very slow increase in my output zarr footprint, 0.5MB/s at best.
The problem is that the data is stored in a way that only makes it efficient to access all locations at once. If you slice out a small area and load all times, you are effectively loading data for the entire globe.
To fix this, you could use a tool like "rechunker" to convert these arrays into a format that allows for efficient queries across time: https://medium.com/pangeo/rechunker-the-missing-link-for-chunked-array-analytics-5b2359e9dc11
The problem is that the data is stored in a way that only makes it efficient to access all locations at once. If you slice out a small area and load all times, you are effectively loading data for the entire globe.
To fix this, you could use a tool like "rechunker" to convert these arrays into a format that allows for efficient queries across time: https://medium.com/pangeo/rechunker-the-missing-link-for-chunked-array-analytics-5b2359e9dc11
Could you provide an example of the optimal way to do this? Lets say I just need data at one latitude/longitude/level, but for the entire record.
Hi, everyone. It is really convenient to access ERA5 data from cloud storage. However, it's very slow to save the processed data as netcdf format. It has taken 40 minutes so far and still has not been saved successfully. How can I solve this problem and increase the speed of saving ERA5 chunks data?This is my code.