Open Jackie-Jiang opened 4 years ago
Related Issue #4036
Cleanup the unused old star-tree: #5086
To add here, I think that we should introduce the column-based interface for data indexing (maybe it's the same idea as Design an interface (close to the idea of the stats collector) to store all the column data)
.
If the input data is based on the columnar format, we will be able to generate dictionary/indices column by column. This will probably consume much less heap because we don't need to store all column data at the same time. Also, we can add the parallelization config to make the engine process multiple columns concurrently to speed up.
Motivation:
End goal: