MergeTree: more efficient read operations for small data volumes

wudidapaopao commented 2 months ago

The existing MergeTree storage engine's IMergeTreeReader furnishes a 'readRows' function, utilized for fetching a predefined quantity of rows from a specific mark. Given the coarseness of this operation, we propose the implementation of an additional function. This new function would enable the reading of a limited row count, beginning at a user-defined offset within the mark, thereby enhancing flexibility and efficiency in data retrieval scenarios requiring fine-grained control.

Benefits: Preventing the deserialization costs associated with superfluous data significantly boosts read operation efficiency： After PreWhere, only a small number of rows need to be read. In the second phase read operation of lazy materialization, fetch only a small amount of data.

Implementation requirements: Adding an 'offset' parameter to the 'readRows' function in IMergeTreeReader indicates that reading should commence from the offset displacement within the specified mark. A new interface is introduced in ISerialization to skip the first 'offset' rows of data, initiating deserialization from the 'offset' displacement.

nickitat commented 2 months ago

if we consider our default codec for example - lz4 - we cannot start decompressing in the middle of a block. but seems possible with dictionary encoding or rle.

UnamedRus commented 2 months ago

I think, it's not about decompression, but deserialization. win will be smaller, but still exist, especially for strings and States(big problem for uniqCombined and other big states)

Also we have codecs like NONE, which is good for random uncompressible data

wudidapaopao commented 2 months ago

@nickitat @UnamedRus Yes, deserialization can save some CPU resources. In this PR, I added the 'deserializeBinaryBulkWithMultipleStreamsSilently' interface to ISerialization, which is used to skip the deserialization work of unnecessary rows during the deserialization process. Can we make some modifications on this basis to improve the point query performance of MergeTree?

amosbird commented 2 months ago

You may be interested in this implementation https://github.com/ByConity/ByConity/blob/e9efd35680172766a368bac4689abc0132fc6a3d/src/Storages/MergeTree/MergeTreeReaderWide.cpp#L209

ClickHouse / ClickHouse

MergeTree: more efficient read operations for small data volumes #65209