GreptimeTeam / greptimedb

An open-source, cloud-native, unified time series database for metrics, logs and events with SQL/PromQL supported. Available on GreptimeCloud.
https://greptime.com/
Apache License 2.0
4.28k stars 307 forks source link

feat: add a size limit on a record batch #2220

Open niebayes opened 1 year ago

niebayes commented 1 year ago

What problem does the new feature solve?

Currently, there is no size limit set on a record batch, which has become problematic due to the constraints imposed by the Tonic crate on message sizes for both receiving and sending. Tonic employs a default size limit of 4MB for received messages, defined as the DEFAULT_MAX_RECV_MESSAGE_SIZE constant, and a default maximum send message size of 2^64 - 1 bytes, defined as the DEFAULT_MAX_SEND_MESSAGE_SIZE constant.

To address this issue, it's crucial to set a size limit on a record batch to prevent it from exceeding 4MB.

What does the feature do?

Introduce a size limit on a record batch.

Implementation challenges

niebayes commented 1 year ago

Since the unit of transmission is a flight data which may contain at least one record batch, the size limit shall be much more conservative than the default maximum size set by Tonic.

TheWaWaR commented 1 year ago

I'm interested in this feature.

Firstly, I have a question to ask, is this feature for server security by limiting the grpc request/response size?

If that's the case. I can see 2 ways to achieve this:

niebayes commented 1 year ago

@MichaelScofield PTAL

MichaelScofield commented 1 year ago

@TheWaWaR Thx for your interest. It's not about security. It's a configuration on request/response's size limit in Tonic that should be considered.

I think the service level limitation is enough.

TheWaWaR commented 1 year ago

By searching the codebase I find max_decoding_message_size already applied to grpc clients. So if I understand correctly the rest tasks should include:

MichaelScofield commented 1 year ago

@TheWaWaR You are right.

MichaelScofield commented 1 year ago

@TheWaWaR thx, I've merged your PR. However, to close this issue, I think it's important to set a limit on RecordBatches, too. The RecordBatches are popped from DataFusion, so it requires additional efforts to investigate how to restrict their size. Are you willing to continue to do that?

TheWaWaR commented 1 year ago

@MichaelScofield

Are you willing to continue to do that?

Yes, why not.

So, I have some new questions:

First, how to define the size of a RecordBatches? There is no Serialize, Deserialize implementation for RecordBatches, but HttpRecordsOutput have Serialize, Deserialize implementation. And serialization can have multiple target formats such as json/msgpack/bincode. So the first question should be proper answered.

Second, what's the intension to limit the size of a RecordBatches? Wherever RecordBatches serialized/deserialized, currently it only transferred use grpc, why limit the grpc message size is not enough? Are there other places RecordBatches used and the size should be limited (other than grpc)? Or let me ask the other way: what are the cases when RecordBatches size not limited become a problem?

MichaelScofield commented 1 year ago

@TheWaWaR sorry for the late reply. The size of a RecordBatch is indeed hard to estimate. The best guess might be sum up its vectors' memory_size.

As to the second question, now our grpc interface does have the size limit. However, what if we feed a huge RecordBatch to the grpc interface, will it segment the input inside to fit its size limit? A quick googling suggests that Tonic does not have this type of feature. So I guess it would be a problem if the underlying query engine pops a huge RecordBatch that is larger than the Tonic's size limit. It would be simply failed.

That said, limiting the size of RecordBatch (or segment it) is hard. I think a more proper place to do it is in FlightRecordBatchStream. When the encoded flight data is over the grpc size limit, segment it. This way we avoid calculating the size of RecordBatch. And since FlightData is a plain protobuf message, segmenting it should be easier.