Open cwasicki opened 9 months ago
Moving post v1.0:
aggregation: how to aggregate the data points in the interval, e.g. mean, sum, min/max, interpolation (upsampling)
With the current ETL the aggregation is pre-determined, changing the aggregation method would only work on raw data and can be deferred to the client or the user.
closed: left or right closed interval, i.e. ts1 <= ts < ts2 or ts1 < ts <= ts2
Don't think that closed
has high practical implications.
label: which timestamp is assigned to the resampled interval. Possible options:
That's easy to fix by the user or on client level.
resolution (update): the resolution parameter currently does not support resampling periods smaller than 1s.
At least for our current data streams with a handful of samples per second this is of minor importance.
What's needed?
Users could get more control about resampling:
ts1 <= ts < ts2
orts1 < ts <= ts2
The label of the first version of the API defaults to fixed-interval using the start of the resampling bin as timestamp.
Proposed solution
Support corresponding parameters in
ResamplingOptions
.Use cases
Different aggregations make sense if metrics like energy (sum) or peak values (min/max) are of interest.
Different
closed
options could be helpful if data is compared with external data that could use another interval definition (e.g. DSO, clients).label
is required if theclosed
option is changed to avoid weird timestamps.Resolutions below 1s could be interesting if faster reaction is needed or very short-term forecasts. Since we plan to make resampling mandatory when aggregating components, the shortest resolution would be 1s for component aggregations.
Alternatives and workarounds
No response
Additional context
Related to:
Example for resampling options in pandas: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.resample.html