AI4S2S / lilio

Calendar generator for machine learning with timeseries data
https://lilio.readthedocs.io/en/latest/
Apache License 2.0
5 stars 1 forks source link

Make resample lazy (when a dask array is passed) #56

Closed BSchilperoort closed 1 year ago

BSchilperoort commented 1 year ago

While optimized with dask, lilio.resample still returned a computed object.

The underlying issue turned out to be that the empty-interval check was performed on the resampled data, which checked for NaNs in the resampled variables. This needs them to be computed. It also is quite inefficient: we know how many samples each i_interval has.

Therefore the check has been moved to an earlier stage, and modified to check the list of indices (indices corresponding to each interval).

This removes any RAM limitations, and allows datasets of any size to be resampled, as they can be computed and written to disk chunk by chunk.

sonarcloud[bot] commented 1 year ago

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

94.4% 94.4% Coverage
0.0% 0.0% Duplication