While optimized with dask, lilio.resample still returned a computed object.
The underlying issue turned out to be that the empty-interval check was performed on the resampled data, which checked for NaNs in the resampled variables. This needs them to be computed. It also is quite inefficient: we know how many samples each i_interval has.
Therefore the check has been moved to an earlier stage, and modified to check the list of indices (indices corresponding to each interval).
This removes any RAM limitations, and allows datasets of any size to be resampled, as they can be computed and written to disk chunk by chunk.
While optimized with dask,
lilio.resample
still returned a computed object.The underlying issue turned out to be that the empty-interval check was performed on the resampled data, which checked for NaNs in the resampled variables. This needs them to be computed. It also is quite inefficient: we know how many samples each i_interval has.
Therefore the check has been moved to an earlier stage, and modified to check the list of indices (indices corresponding to each interval).
This removes any RAM limitations, and allows datasets of any size to be resampled, as they can be computed and written to disk chunk by chunk.