apache / pinot

Apache Pinot - A realtime distributed OLAP datastore
https://pinot.apache.org/
Apache License 2.0
5.43k stars 1.27k forks source link

[Time series Gap-fill V2] Server side gap-filling of time series data #9492

Open lakshmanan-v opened 2 years ago

lakshmanan-v commented 2 years ago

We recently introduced gapfill() to interpolate and fill gaps in a time series dataset. In the current solution, the data is moved and gapfilling is performed in the broker. This puts lots of stress on the broker as all of the raw records to be transferred from server to broker.

This brings in a lot of limitations and doesn't work for large dataset with larger date range. In V2, let us revisit the implementation to push down the gapfill and aggregations to server side by leveraging data locality and make this feature scale better.

cc @weixiangsun @Jackie-Jiang

Jackie-Jiang commented 2 years ago

cc @walterddr

lakshmanan-v commented 2 years ago

Design doc: https://docs.google.com/document/d/1ZPV3YVvNQYP1Cg0rBPWQEYNA168-xtJaKa57F4Bj2Iw/edit#