[CQLReplicator on Glue] Update CQLReplicator.scala for memorydb, parquet, and opensearch

aws-samples / cql-replicator

CQLReplicator is a migration tool that helps you to replicate data from Cassandra to AWS Services

Apache License 2.0

15 stars 8 forks source link

Proposed changes:

Replace sparkSession.read.parquet in dataReplicationProcess() and keysDiscoveryProcess() to glueContext.getSourceWithFormat with DISK_ONLY
Add boolean data type in rowToStatement
Add replicationPointInTime to replicate data from a specific data point
Add ClientConfiguration with retries for the S3 client
Add aggregated stats for dataReplicationProcess(), read count.json, aggregate, update count.json. Proposed new structure: { "tile": 0, "primaryKeys": value "updatedPrimaryKeys": value "insertedPrimaryKeys": value "deletedPrimaryKeys": value "updatedTimestamp": value }
Updated cqlreplicator to return new stats --state stats
Update README.MD

aws-samples / cql-replicator