Nordstrom / kafka-connect-sqs

The SQS connector plugin provides the ability to use AWS SQS queues as both a source (from an SQS queue into a Kafka topic) or sink (out of a Kafka topic into an SQS queue).
Apache License 2.0
70 stars 40 forks source link

Batch & parallelism send for sink connector? #54

Open yichao-figma opened 2 weeks ago

yichao-figma commented 2 weeks ago

Looks like the SQS send is done on a per-record basis: https://github.com/Nordstrom/kafka-connect-sqs/blob/master/src/main/java/com/nordstrom/kafka/connect/sqs/SqsSinkConnectorTask.java#L111

Given SQS.send usually takes 10ms+, this wouldn't scale for partition that has > 100 RPS.

There could be two options for optimization:

  1. Group record in batch of 10
  2. Use ExecutionService to parallel-send the batches, and use Future to ensure success of all batches before return in put()
dylanmei commented 1 week ago

There is an outstanding MR for add an ASYNC mode; does this meet your expectations for optimization?

https://github.com/Nordstrom/kafka-connect-sqs/pull/53