confluentinc / kafka-connect-bigquery

A Kafka Connect BigQuery sink connector
Apache License 2.0
3 stars 2 forks source link

Configure batch size on KafkaConnect BigQuery sink connector #396

Open icyBlue27 opened 9 months ago

icyBlue27 commented 9 months ago

Using the KafkaConnect BigQuery sink connector, from time to time, we observe the following error since the streaming API has a limit on the size of the batch it can write to BigQuery com.google.cloud.bigquery.BigQueryException: Request size is too big: 12705398 limitation: 12582912. Would it be possible to have the batch size configurable on the sink connector to stay under certain limits?

Kafka offers different parameters to configure on the producer at the moment (https://kafka.apache.org/documentation/#producerconfigs).

b-goyal commented 9 months ago

@icyBlue27 , could you try setting consumer.override.max.poll.records to a lower number ? Default value for this config is 500.

icyBlue27 commented 9 months ago

thanks @b-goyal . If I understand correctly max.poll.records controls the number of records which are pulled. This is not exactly what we would like to have, which is a limit on the size in bytes of the batch. True that less records would probably be smaller in size, but this is not an optimal solution.