Closed SarahAlidoost closed 9 months ago
I'm glad you were able to find a way to fix this!
I have also found that
open_mfdataset
can be quite slow. In cases where you have big datasets, and know well how to concatenate/merge the data, opening the files separately and then defining the merging operations manually can lead to better performance.The code here is fine as is, it'll be mostly replaced anyway once we move to Zampy's output.
thanks. I added other changes see here, can you have another look?
Kudos, no new issues were introduced!
0 New issues
0 Security Hotspots
100.0% Coverage on New Code
0.0% Duplication on New Code
close #94
In this PR:
chunks
to"auto"
to avoid memory issues inxr.open_mfdataset
, because, by default, chunks will be chosen to load entire input files into memory at once. see doc.S
is replaced withs
to fixpandas: FutureWarning: 'S' is deprecated and will be removed in a future version, please use 's' instead.
This works also for pandas < 2, see source code.dask.config.set({"array.slicing.split_large_chunks": True})
to avoid creating the large chunk, because ofPerformanceWarning: Slicing is producing a large chunk
, see doc.There is still another
PerformanceWarning: Increasing number of chunks by factor
. This is due to internal re-chunking and might be solved by zampy. see dask source code.