cockroachdb / cockroach

CockroachDB — the cloud native, distributed SQL database designed for high availability, effortless scale, and control over data placement.
https://www.cockroachlabs.com
Other
29.86k stars 3.77k forks source link

sql,kv,storage: push column batch generation into kvserver #82323

Open ajwerner opened 2 years ago

ajwerner commented 2 years ago

23.1 must-haves:

23.1 nice-to-haves:

Later:


Is your feature request related to a problem? Please describe. One known bottleneck for cockroach performance is so-called "scan speed". In practice, this is the speed to scan data off of disk, encode it into the scan response, decode it, then re-encode it into a columnar format. The columnar format is now used extensively in execution. The above summary is misleading in a dedicated cluster: often the query execution happens in the same process as the kvserver, so the encoding and decoding step can be skipped. In multi-tenant deployments, the data must be transmitted over the network back to the server. This can be particularly costly when the data is being served from a separate availability zone ([1], https://github.com/cockroachdb/cockroach/issues/71887). The above proposal has the potential to improve the speed by 1) not decoding columns we don't need and 2) creating much smaller responses.

Any eventual movement towards columnarization at the storage layer will need to have a corresponding read API. This issue posits that we should build the columnar read API first to gain experience.

Describe the solution you'd like

We should make an apache arrow batch response format which does column projection based on the IndexFetchSpec.

Additional context

Relates very closely to if not just adds exposition to https://github.com/cockroachdb/cockroach/issues/71887.

@jordanlewis made a prototype here: https://github.com/cockroachdb/cockroach/pull/52863. At the time it showed a ~5% win in TPCH performance.

@RaduBerinde put in a ton of work to clean up how we specify the data to be fetched. Now there exists a small protobuf which could conceivably be transmitted with the scan request and used to describe how to decode the data.


[1] We're probably going to do https://github.com/cockroachdb/cockroach/issues/72593 to attack the cross-AZ network cost problem.

Jira issue: CRDB-16284 Epic: CRDB-14837

awoods187 commented 2 years ago

@vy-ton and @rytaft can you all discuss this work and prioritize it accordingly?

I think it can be powerful for our serverless users (to help control costs/RUs) and will benefit everyone with faster scan speed (covers up poorly written queries, less than ideally optimized queries)

vy-ton commented 2 years ago

Queries has https://cockroachlabs.atlassian.net/browse/CRDB-14837 to track this. During 22.2 planning, we kept this off the roadmap but left it next in priority as resources free up. FYI @mgartner

jordanlewis commented 2 years ago

We should make an apache arrow batch response format which does column projection based on the IndexFetchSpec.

In case you weren't aware, we do actually use apache arrow already in the vectorized engine, to send the columnar data around via DistSQL. The unfortunate bit is that we do not currently use the arrow serialized data as-is in the vectorized engine for most types - it requires deserialization, which sort of defeats the point. See pkg/col/colserde

ajwerner commented 2 years ago

which sort of defeats the point

This seems somewhat dramatic. At least some of the point here is to project out column to make the returned batch smaller. I hope that our arrow to column vector conversion is relatively high throughput.

jordanlewis commented 2 years ago

I think it's high throughput enough, and there are some benchmarks. I just meant I've always kind of regretted not finding a way to unify the in memory representation and the arrow one for that zero copy goodness!

yuzefovich commented 1 year ago

How sad would people feel if we don't implement the "local fast-path" with the KV projection pushdown? I mean that even if the KV Scan request is evaluated locally and we create coldata.Batch with the necessary columns, we still would serialize it into []byte (currently in the arrow format) and would then deserialize it in ColBatchDirectScan.

The main argument for keeping the local fast path is that we can eliminate this serialization / deserialization step, and it seems nice to do it if we can.

However, there are several reasons for not doing it:


Update: we probably will implement this local fast-path after all in order to eliminate (or at least substantially reduce) the perf hit when comparing against single-tenant deployment.