aws-samples / cql-replicator

CQLReplicator is a migration tool that helps you to replicate data from Cassandra to AWS Services
Apache License 2.0
16 stars 8 forks source link

157 cost optimization workers shutdown between discovery phase runs to save aws glue costs #167

Open nwheeler81 opened 5 days ago

nwheeler81 commented 5 days ago

Issue #, if available:

134 #157 #166 #120

Description of changes: This update introduces several performance improvements and new features to enhance the migration process and user experience. Support for UNLOGGED Batch operations (fully configurable) has been added, improving performance by 35% and reducing Glue costs. A new feature allows customers to set target traffic against Amazon Keyspaces, simplifying and streamlining the migration process.

  1. Several validation checks have been implemented to help customers quickly identify potential issues:

    • Validating access with IAM Role to Amazon Keyspaces and S3
    • Validating the subnet for self-referencing rule
    • Checking if the Glue Service Role is present
    • Validating if the Glue connection is in place
  2. Additional improvements include:

    • A Row Sampler to assess row size before migration
    • Checking if table is provisioned correctly before starting CQLReplicator
    • Failing fast if there's no access to the Cassandra cluster
    • Improved error handling by returning Glue errors to STDOUT
    • Added a command to kill Glue Jobs immediately
    • Increased default number of DPUs for the discovery process from (tiles + 1) / 2 to (tiles + 1)
    • Ability to filter non primary key columns
    • Ability to recover automatically after killing CQLReplicator Glue jobs after issuing kill command
    • Added a cost estimator

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.