datastax / cdc-apache-cassandra

Datastax CDC for Apache Cassandra
Apache License 2.0
35 stars 21 forks source link

[Improvement][Source]: Infuse the key values returned from the mutation keys instead of reading from DB #84

Open aymkhalil opened 2 years ago

aymkhalil commented 2 years ago

This performance improvement applies only to the JSON only format introduced here #79 #74

In AVRO and JSON Key Value schemas, the Key is populated based on the mutation key that is read from the dirty topic. Only value fields are read from the C* table to spare few CPU cycles.

With the JSON only option (where the Key + Value are encoded in the payload) - the simpler implementation that was introduced is to read the key from the C table along with the value fields. This will enable the the NativeJsonConverter to be agnostic to the fact that the produced Schema is both key and value embedded in the value or a Key Value schema - ideally it should not be aware of that. Now the tradeoff here is to read extra bytes from C to get the Key columns (there is no extra read, just extra columns in the CQL SELECT statement).

Refactoring is required to enable this optimization. I don't have metrics handy to decide if this is an over-optimization (@eolivelli do you know if this is the case?)

eolivelli commented 2 years ago

We will see if there is need for this optimization when we will have users of this feature