datastax / cdc-apache-cassandra

Datastax CDC for Apache Cassandra
Apache License 2.0
36 stars 22 forks source link

[connector] CDC doesn't support primary key only tables #142

Open aymkhalil opened 1 year ago

aymkhalil commented 1 year ago

A table with primary key only columns with cdc enabled, will results in errors in the connector. My initial guess is primary key only tables are not supported because we rely on the value part of the key value schema (that is generated from non-primary key columns) to indicate a delete by passing null values. In the case of primary key only, it is not clear how deletes will be differentiated from inserts because the value is always null.

To reproduce:

  1. Create a table with PK only
    CREATE TABLE source.results (
    uuid           text,
    value          int,
    PRIMARY KEY ((uuid, value))
    )
    WITH cdc = TRUE;
  2. Enable CDC connectors
  3. The following error will be logged and no events will be sent to the data topic
    2023-04-11T15:35:02,097+0000 [public/cdc-test/origin-results-0] ERROR org.apache.pulsar.functions.instance.JavaInstanceRunnable - [public/cdc-test/origin-results:0] Uncaught exception in Java Instance
    java.util.concurrent.CompletionException: com.datastax.oss.driver.api.core.servererrors.SyntaxError: line 1:7 no viable alternative at input 'FROM' (SELECT [FROM]...)
    at java.util.concurrent.CompletableFuture.reportJoin(CompletableFuture.java:412) ~[?:?]
    at java.util.concurrent.CompletableFuture.join(CompletableFuture.java:2044) ~[?:?]
    at com.datastax.oss.pulsar.source.CassandraSource.batchRead(CassandraSource.java:574) ~[?:?]
    at com.datastax.oss.pulsar.source.CassandraSource.maybeBatchRead(CassandraSource.java:463) ~[?:?]
    at com.datastax.oss.pulsar.source.CassandraSource.read(CassandraSource.java:455) ~[?:?]

The desired behavior

  1. Have the events go through (consider using a metadata in the schema to indicate DELETE vs INSERT)
  2. Skip the records with a clear user feedback