datastax / cdc-apache-cassandra

Datastax CDC for Apache Cassandra
Apache License 2.0
35 stars 21 forks source link

[Source][Utilization] Enable processing multiple C* in a single source instance #94

Open aymkhalil opened 1 year ago

aymkhalil commented 1 year ago

Today, the C* source connectors only allows 1:1 between tables and sinks. In order to increate the utilization of the underling resources associated with a single source instance (e.g. Memory footprint a single sink is ~500MB, which does not scale well if the user has 10s or 100s of tables), the proposal is to enable users to configure multiple tables in their source config.

Proposed source config:

configs:
  contactPoints": "localhost",
  loadBalancing.localDc": "Cassandra" , "outputFormat": "key-value-avro"
  tables:
    ks1:
      table1:
        events.topic": "persistent://public/default/events-ks1.table1"
        data.topic": "persistent://public/default/data-ks1.table1"
    ks2:
      table2:
        events.topic": "persistent://public/default/events-ks2.table2"
        data.topic": "persistent://public/default/data-ks2.table2"
aymkhalil commented 1 year ago

Alternatively, we can keep the config as close as possible to today's single table configs by replacing data.topic with destination-topic-name