Open pl-marasco opened 1 year ago
Before reading the details here, have you seen auto_dask
? It will do the tree reduction for you with whatever dask setup you have ready. The critical point is that the arguments to MultiZarrToZarr in the branches stage are not the same as in the final trunk stage - the function tries to get this right for you.
I didn't as the dataset is covering a large area with a pretty big time dimension, and I thought that the classical approach (with intermediate single files) would have saved some time. Consider that, best scenario, the dataset has to be updated every day worst scenario every 10 days.
Anyhow, following your suggestion, I've tested the auto_dask
and it works only if I use the coo_map to get the date from the filename. With a super small subset, no problem but I've to test with the entire archive.
it works only if
What problem do you face?
Following this example I'm trying to replicate the tree reduction approch with the Copernicus Global Land products TOC 300m.
In this specific case I've reduced the test to a simple combination of the archive along the time dimension (that's contained in the the filename). To avoid other issues I've previously added the time dimension.
Using the tree reduction approach I get a wrongly structured Dataset that has a single dimension on the
concat_dim
( in my case atime
coordinate ). With the simpler approch I've no problem; this bring me to think that the issue isn't coming from the data.During the computation I get this warning:
On top of this issue if I try to retreat the
time
value from the file name throught the coo_map this fails once he try to aggregate the data.Here the code I'm using even if, as is super simple, I would say that nothing is wrong with in it : A small sub sample of the original data can be feached here