Open maha-hajja opened 1 year ago
Here's a brief summary on approaches to do this connector
The most reliable way to stream database changes is to use the cdc mode: https://cassandra.apache.org/doc/stable/cassandra/operating/cdc.html
which works by adding a cdc=true to the table like as follows:
CREATE TABLE foo (a int, b text, PRIMARY KEY(a)) WITH cdc=true;
That will output data changes into files inside the configured cdc directory, specified at a cassandra.yaml
. So, using this method of getting changes means that we need to create a custom program that listens to file changes at the cdc dir. Here are some implementations:
debezium captures cassandra events via a single JVM process inside each cassandra process and publishes them to Kafka.
same as debezium. Consists of 2 components:
There are no database triggers involved, and that's what's recommended from the cassandra docs. On the other hand, we now have a distributed source connector, which adds a considerable amount of complexity.
Another vastly simpler approach to capture cassandra events would be to fetch the given tables every x amount of time for new changes, filtering results via a last_updated
column.
Feature description
add a source connector to https://github.com/conduitio-labs/conduit-connector-cassandra