aws-samples / cql-replicator

CQLReplicator is a migration tool that helps you to replicate data from Cassandra to AWS Services
Apache License 2.0
15 stars 8 forks source link

Cost optimisation by estimating WCU required #134

Closed mati999q closed 1 week ago

mati999q commented 6 months ago

Is your feature request related to a problem? Please describe. Calculating WCU to provision on the target table for the process can be a difficult feat, and is a much needed cost optimisation. Similarly, when only a historical migration is required, the process should be able to detect no more data and gracefully stop.

Describe the solution you'd like A clear and concise description of what you want to happen.

When doing our calculations, we have spotted that each worker is capable of pushing x WCUs - though from the formula attached on the AWS Docs that row size is a factor when calculating the closer estimate.

It would be nice if the jobs could calculate how many workers will be used in the job (presumably this is easy to display once discovery job completes) and the WCU needed to complete this job in x time (i.e. to complete this job in an hour - WCU needs to be x based on the number of workers and row size). This would provide for better runtime and cost calculations when parameterising the jobs.

Describe alternatives you've considered The more information the better with running these jobs, since there is a lot going on and a lot of numbers to keep track of - even displaying the amount of workers & DPUs used in the run will be a slight improvement to cost tracking.

At the moment, to optimise costs you need to be reactive rather than proactive by looking at the current throughput produced.

nwheeler81 commented 1 week ago

https://github.com/aws-samples/cql-replicator/blob/main/glue/resources/CQLReplicator-calculator.xlsx