datastax / cdc-apache-cassandra

Datastax CDC for Apache Cassandra
Apache License 2.0
36 stars 22 forks source link

[Source][Utilization] Enable processing multiple C* in a single source instance #94

Open aymkhalil opened 2 years ago

aymkhalil commented 2 years ago

Today, the C* source connectors only allows 1:1 between tables and sinks. In order to increate the utilization of the underling resources associated with a single source instance (e.g. Memory footprint a single sink is ~500MB, which does not scale well if the user has 10s or 100s of tables), the proposal is to enable users to configure multiple tables in their source config.

Proposed source config:

configs:
  contactPoints": "localhost",
  loadBalancing.localDc": "Cassandra" , "outputFormat": "key-value-avro"
  tables:
    ks1:
      table1:
        events.topic": "persistent://public/default/events-ks1.table1"
        data.topic": "persistent://public/default/data-ks1.table1"
    ks2:
      table2:
        events.topic": "persistent://public/default/events-ks2.table2"
        data.topic": "persistent://public/default/data-ks2.table2"
aymkhalil commented 2 years ago

Alternatively, we can keep the config as close as possible to today's single table configs by replacing data.topic with destination-topic-name