confluentinc / kafka-connect-bigquery

A Kafka Connect BigQuery sink connector
Apache License 2.0
2 stars 1 forks source link

Could not serialize access to table due to concurrent update #127

Open slvrtrn opened 3 years ago

slvrtrn commented 3 years ago

Hello, I have the following error happening randomly to my BigQuery connector

com.wepay.kafka.connect.bigquery.exception.BigQueryConnectException: Some write threads encountered unrecoverable errors: com.google.cloud.bigquery.BigQueryException: Could not serialize access to table [table-name] due to concurrent update; See logs for more detail

I am using the following configuration

{
  "name": "...",
  "config": {
    "connector.class": "com.wepay.kafka.connect.bigquery.BigQuerySinkConnector",
    "tasks.max": "1",
    "topics.regex": "...",

    "sanitizeTopics": "true",
    "sanitizeFieldNames": "true",

    "autoCreateTables": "true",
    "autoUpdateSchemas": "true",

    "transforms": "unwrap,RenameField",
    "transforms.unwrap.type": "io.debezium.transforms.ExtractNewRecordState",
    "transforms.unwrap.delete.handling.mode": "rewrite",
    "transforms.unwrap.add.fields": "source.ts_ms",

    "transforms.RenameField.type": "org.apache.kafka.connect.transforms.ReplaceField$Value",
    "transforms.RenameField.renames": "__source_ts_ms:__timestamp",

    "consumer.override.max.request.size": 104857600,
    "consumer.override.max.poll.records": 5000,

    "allowNewBigQueryFields": true,
    "allowBigQueryRequiredFieldRelaxation": true,
    "allowSchemaUnionization": true,
    "upsertEnabled": true,
    "deleteEnabled": true,
    "kafkaKeyFieldName": "__kafkaKey",

    "mergeIntervalMs": 5000,
    "mergeRecordsThreshold": 5000,

    "queueSize": 5000,
    "threadPoolSize": 30,

    "bufferSize": "100000",
    "maxWriteSize": "10000",
    "tableWriteWait": "1000",

    "project": "...",
    "datasets": "...",
    "defaultDataset": "...",
    "keyfile": "..."
  }
}

Is there anything that I can do with the configuration to prevent that error? The connector version is 2.1.4.

aakarshg commented 2 years ago

I am noticing similar issue as above as well. Were you able to figure out the issue @slvrtrn ?

slvrtrn commented 2 years ago

@aakarshg unfortunately, no. We decided not to use this connector.

FreCap commented 2 years ago

+1

FreCap commented 2 years ago

Adding a MR with a simple retry. I stopped seeing serialization problems after it: #220

aakarshg commented 2 years ago

@FreCap Looks like CI is failing.

What's interesting is that I only notice the issue when setting tasks.max to a number higher than 1. However, the OP's configuration the parameter was set to 1. So, I'm not really sure where the concurrent updates would have been happening from

aakarshg commented 2 years ago

@aakarshg unfortunately, no. We decided not to use this connector.

Thanks for letting me know. If you dont mind, can you share what was the alternative solution that you ended up using?

FreCap commented 2 years ago

@aakarshg looking at the CI, the tests are failing only due to a lack of the BigQuery KeyFile: https://jenkins.public.confluent.io/job/kafka-connect-bigquery/job/PR-220/1/testReport/junit/com.wepay.kafka.connect.bigquery.integration/BigQueryErrorResponsesIT/testWriteToTableWithoutSchema/

https://jenkins.public.confluent.io/job/kafka-connect-bigquery/job/PR-220/1/