apache / pinot

Apache Pinot - A realtime distributed OLAP datastore
https://pinot.apache.org/
Apache License 2.0
5.5k stars 1.29k forks source link

Enhance Data Ingestion Engine #5089

Open Jackie-Jiang opened 4 years ago

Jackie-Jiang commented 4 years ago

Motivation:

End goal:

mcvsubbu commented 4 years ago

Related Issue #4036

Jackie-Jiang commented 4 years ago

Cleanup the unused old star-tree: #5086

snleee commented 2 years ago

To add here, I think that we should introduce the column-based interface for data indexing (maybe it's the same idea as Design an interface (close to the idea of the stats collector) to store all the column data).

If the input data is based on the columnar format, we will be able to generate dictionary/indices column by column. This will probably consume much less heap because we don't need to store all column data at the same time. Also, we can add the parallelization config to make the engine process multiple columns concurrently to speed up.