apache / pinot

Apache Pinot - A realtime distributed OLAP datastore
https://pinot.apache.org/
Apache License 2.0
5.42k stars 1.27k forks source link

Segment memory usage for unused columns #13242

Open dario-liberman opened 4 months ago

dario-liberman commented 4 months ago

In order design the tables so to optimise MMAP loaded segments to fit in memory we would like to understand if unused (or rarely used columns) would be impacting memory usage.

Basically we want to serve two use cases:

  1. Aggregated metrics broken down by dimensional columns (mostly enumerations like countries, event name, etc)
  2. Log access by user uuid including not just dimensional columns but also identifiers (eg. order uuid, session uuid, etc)

Data is partitioned by user uuid.

The second use-case is rarely used, 99.99% load is for first use case.

We would like to understand if Pinot would map to memory columns that are not used in the aggregation queries when loading a segment.

An alternative could be to have two tables, one with just dimensional columns, one with all columns, use each table respectively for each use case.

dario-liberman commented 4 months ago

@Jackie-Jiang - Maybe you would know the answer?

dario-liberman commented 4 months ago

Read ahead size might also play a role in this https://github.com/apache/pinot/issues/12166

Jackie-Jiang commented 4 months ago

Unused columns shouldn't be loaded into memory. Again, MMAP is controlled by OS, not Pinot or Java

dario-liberman commented 4 months ago

I was concerned about indiscriminate prefetching of all column indexes at the time a segment is loaded for example.

https://github.com/apache/pinot/blob/master/pinot-segment-local%2Fsrc%2Fmain%2Fjava%2Forg%2Fapache%2Fpinot%2Fsegment%2Flocal%2Fsegment%2Fstore%2FSegmentLocalFSDirectory.java#L308

Jackie-Jiang commented 4 months ago

This is happening only during server start to try to load data into memory to reduce cold start impact. When a column is unused, it will be flushed out of memory very soon and not loaded back