aws-samples / cql-replicator

CQLReplicator is a migration tool that helps you to replicate data from Cassandra to AWS Services
Apache License 2.0
15 stars 8 forks source link

[CQLReplicator on Glue] Set replication starting timestamp #85

Closed hungfaileung closed 8 months ago

hungfaileung commented 8 months ago

Is it possible to have the same functionality as the ECS version about ENABLE_REPLICATION_POINT and STARTING_REPLICATION_TIMESTAMP?

https://github.com/aws-samples/cql-replicator/blob/main/ecs/README.MD?plain=1#L121-L122

Basically, replicating data after a specific point of time, not all the data.

nwheeler81 commented 8 months ago

yeah, it's possible.

nwheeler81 commented 8 months ago

1/ Add a flag --start-replication-from [epoch-time] in cqlreplicator and CQLReplicator.scala. Works only in combination with --writetime-column [regular-column]. 2/ Replicates rows if writetime(regular-column)>epoch-time 3/ Add a filter statement in keysDiscoveryProcess(): val primaryKeysDf = sparkSession.read.option("inferSchema", "true").table(source).persist(StorageLevel.DISK_ONLY) 4/ Add checks: regular-column is supplied, epoch-time is provided and greater 0