Here's a brief summary on approaches to do this connector

Using CDC mode

The most reliable way to stream database changes is to use the cdc mode: https://cassandra.apache.org/doc/stable/cassandra/operating/cdc.html

which works by adding a cdc=true to the table like as follows:

CREATE TABLE foo (a int, b text, PRIMARY KEY(a)) WITH cdc=true;

That will output data changes into files inside the configured cdc directory, specified at a cassandra.yaml. So, using this method of getting changes means that we need to create a custom program that listens to file changes at the cdc dir. Here are some implementations:

https://debezium.io/documentation/reference/stable/connectors/cassandra.html

debezium captures cassandra events via a single JVM process inside each cassandra process and publishes them to Kafka.

https://docs.datastax.com/en/cdc-for-cassandra/2.2.9/index.html

same as debezium. Consists of 2 components:

DataStax Change Agent for Apache Cassandra: the event producer deployed on each cassandra node,
CDC for Cassandra: their source connector deployed in a Pulsar cluster

There are no database triggers involved, and that's what's recommended from the cassandra docs. On the other hand, we now have a distributed source connector, which adds a considerable amount of complexity.

Using polling

Another vastly simpler approach to capture cassandra events would be to fetch the given tables every x amount of time for new changes, filtering results via a last_updated column.

ConduitIO / conduit

Cassandra source connector #989

Feature description

Using CDC mode

Using polling