datastax / cassandra-data-migrator

Cassandra Data Migrator - Migrate & Validate data between origin and target Apache Cassandra®-compatible clusters.
Apache License 2.0
28 stars 18 forks source link

License Apache2 Star on Github GitHub release (with filter) Docker Pulls

cassandra-data-migrator (also known as CDM)

Migrate and Validate Tables between Origin and Target Cassandra Clusters.

:warning: Please note this job has been tested with spark version 3.5.3

Install as a Container

Install as a JAR file

Prerequisite

:warning: If the above Spark and Scala version is not properly installed, you'll then see a similar exception like below when running the CDM jobs,

Exception in thread "main" java.lang.NoSuchMethodError: scala.runtime.Statics.releaseFence()V

Steps for Data-Migration:

:warning: Note that Version 4 of the tool is not backward-compatible with .properties files created in previous versions, and that package names have changed.

  1. cdm.properties file needs to be configured as applicable for the environment. Parameter descriptions and defaults are described in the file. The file can have any name, it does not need to be cdm.properties.
  2. Place the properties file where it can be accessed while running the job via spark-submit.
  3. Run the below job using spark-submit command as shown below:
spark-submit --properties-file cdm.properties \
--conf spark.cdm.schema.origin.keyspaceTable="<keyspacename>.<tablename>" \
--master "local[*]" --driver-memory 25G --executor-memory 25G \
--class com.datastax.cdm.job.Migrate cassandra-data-migrator-4.x.x.jar &> logfile_name_$(date +%Y%m%d_%H_%M).txt

Note:

--conf spark.cdm.filter.cassandra.partition.min=<token-range-min>
--conf spark.cdm.filter.cassandra.partition.max=<token-range-max>

Steps for Data-Validation:

spark-submit --properties-file cdm.properties \
--conf spark.cdm.schema.origin.keyspaceTable="<keyspacename>.<tablename>" \
--master "local[*]" --driver-memory 25G --executor-memory 25G \
--class com.datastax.cdm.job.DiffData cassandra-data-migrator-4.x.x.jar &> logfile_name_$(date +%Y%m%d_%H_%M).txt
23/04/06 08:43:06 ERROR DiffJobSession: Mismatch row found for key: [key3] Mismatch: Target Index: 1 Origin: valueC Target: value999) 
23/04/06 08:43:06 ERROR DiffJobSession: Corrected mismatch row in target: [key3]
23/04/06 08:43:06 ERROR DiffJobSession: Missing target row found for key: [key2]
23/04/06 08:43:06 ERROR DiffJobSession: Inserted missing row in target: [key2]
spark-submit --properties-file cdm.properties \
--conf spark.cdm.schema.origin.keyspaceTable="<keyspacename>.<tablename>" \
--conf spark.executor.extraJavaOptions='-Dlog4j.configurationFile=log4j2.properties' \ 
--conf spark.driver.extraJavaOptions='-Dlog4j.configurationFile=log4j2.properties' \ 
--master "local[*]" --driver-memory 25G --executor-memory 25G \
--class com.datastax.cdm.job.DiffData cassandra-data-migrator-4.x.x.jar &> logfile_name_$(date +%Y%m%d_%H_%M).txt

Rerun (previously incomplete) Migration or Validation

spark-submit --properties-file cdm.properties \
 --conf spark.cdm.schema.origin.keyspaceTable="<keyspacename>.<tablename>" \
 --conf spark.cdm.trackRun.previousRunId=<prev_run_id> \
 --master "local[*]" --driver-memory 25G --executor-memory 25G \
 --class com.datastax.cdm.job.<Migrate|DiffData> cassandra-data-migrator-4.x.x.jar &> logfile_name_$(date +%Y%m%d_%H_%M).txt

Perform large-field Guardrail violation checks

spark-submit --properties-file cdm.properties \
--conf spark.cdm.schema.origin.keyspaceTable="<keyspacename>.<tablename>" \
--conf spark.cdm.feature.guardrail.colSizeInKB=10000 \
--master "local[*]" --driver-memory 25G --executor-memory 25G \
--class com.datastax.cdm.job.GuardrailCheck cassandra-data-migrator-4.x.x.jar &> logfile_name_$(date +%Y%m%d_%H_%M).txt

Features

Things to know

Performance FAQ

Building Jar for local development

  1. Clone this repo
  2. Move to the repo folder cd cassandra-data-migrator
  3. Run the build mvn clean package (Needs Maven 3.9.x)
  4. The fat jar (cassandra-data-migrator-4.x.x.jar) file should now be present in the target folder

Contributors

Checkout all our wonderful contributors here.