aws-samples / cql-replicator

CQLReplicator is a migration tool that helps you to replicate data from Cassandra to AWS Services
Apache License 2.0
15 stars 8 forks source link

Question: How does cqlreplicator behave if a row already exists on TARGET #148

Closed jlewis-spotnana closed 3 months ago

jlewis-spotnana commented 3 months ago

Is your feature request related to a problem? Please describe. I'm trying to understand the behavior of cqlreplicator in certain scenarios.

Describe the solution you'd like Does cqlreplicator ignore rows that already exist on the TARGET? Or does it copy over the existing data?

Describe alternatives you've considered I've tried reading the code, but I'm not familiar with spark nor scala so my understanding has been slow.

Additional context We're planning to setup a new keyspace and have our services aware of both. During this time, we don't want cqlreplicator to copy a given row if it already exists on the TARGET.

jlewis-spotnana commented 3 months ago

This might be a duplicate of (or related to) https://github.com/aws-samples/cql-replicator/issues/69

nwheeler81 commented 3 months ago

Hi @jlewis-spotnana if don't supply an arbitrary regular column, e.g. --writetime-column colX the CQLReplicator won't update rows in the target table. So, the CQLReplicator will replicate only inserts and deletes. if you provide the arbitrary column the CQLReplicartor will update the entire row in the target table if only the arbitrary column has been updated in the source table.

jlewis-spotnana commented 3 months ago

Hi @nwheeler81 thanks for your response. We're not using --writetime-column since our services do not issue UPDATE statements.

We're trying to close a potential race during service cutover from the SOURCE to TARGET table. During cutover (rolling restart), it's possible for our services to write a certain row, say PK1, to the TARGET table. If that same row PK1 already exists in the SOURCE table but cqlreplicator hasn't copied it yet, then cqlreplicator will (eventually) overwrite PK1 on the TARGET.

We would like cqlreplicator to ignore (not copy) Primary Keys which already exist on the TARGET. Is there a way to configure cqlreplicator to ignore existing Primary Keys, given the current code?

I believe this might be what https://github.com/aws-samples/cql-replicator/issues/69 is asking for. (I only found that issue after opening this one.)

nwheeler81 commented 3 months ago

hi @jlewis-spotnana there is a quick fix in CQLReplicator.scala line replace s"INSERT INTO $trgKeyspaceName.$trgTableName JSON '$jsonRow'" to s"INSERT INTO $trgKeyspaceName.$trgTableName JSON '$jsonRow' IF NOT EXISTS". It enables LWT against the target table in Amazon Keyspaces. I will publish an enhancement that will officially contain this feature in 3-4 weeks.

jlewis-spotnana commented 3 months ago

Confirmed the simple patch provided by nwheeler81 works. Thanks for the help!