cockroachdb / cockroach

CockroachDB — the cloud native, distributed SQL database designed for high availability, effortless scale, and control over data placement.
https://www.cockroachlabs.com
Other
29.85k stars 3.77k forks source link

coldata: consider pre-allocating buffer slice for bytes vectors #81985

Open yuzefovich opened 2 years ago

yuzefovich commented 2 years ago

According to our telemetry, in many cases users have a bytes-like type with width defined (i.e. something like STRING(128)). Our coldata.Bytes vectors can support values up to 30 bytes that are fully-inlined whereas larger values are stored in a separate buffer slice. If the width of the type is non-zero and is larger than 30 bytes, we might want to pre-allocate that buffer slice. Based on the telemetry, 256 should probably the maximum pre-allocated space per element. However, it is possible that values rarely or never reach the maximum width of the type, so simply pre-allocating width * length might be wasteful.

First, we should benchmark whether pre-allocating the buffer when all values are of maximum width gives us noticeable improvement (in reduction of allocations) - it is possible that Golang's append works well enough to not complicate this code. If the improvements are noticeable, then we'd probably need to come up with a heuristic for how much to pre-allocate, possibly using avgSize table statistic.

Jira issue: CRDB-16158

github-actions[bot] commented 9 months ago

We have marked this issue as stale because it has been inactive for 18 months. If this issue is still relevant, removing the stale label or adding a comment will keep it active. Otherwise, we'll close it in 10 days to keep the issue queue tidy. Thank you for your contribution to CockroachDB!