aws-samples / cql-replicator

CQLReplicator is a migration tool that helps you to replicate data from Cassandra to AWS Services
Apache License 2.0
15 stars 8 forks source link

restart of cqlreplicator fails with "column ... does not exist" #156

Open jlewis-spotnana opened 3 weeks ago

jlewis-spotnana commented 3 weeks ago

Describe the bug After restarting cqlreplicator, the DISCOVERY job fails with the following error:

Exception in User Class: org.apache.spark.sql.AnalysisException : Column 'my_primary_key_column' does not exist. Did you mean one of the following? [];

The table structure has not changed, so I assume some corruption in cqlreplicator's tracking data (in S3 or the migration.ledger table).

To Reproduce This issue is sporadic. I believe it was caused by the following steps:

  1. Run command ./cqlreplicator --state run ...
  2. Let initial DISCOVERY and backfill complete
  3. Kill all workers via the AWS Glue console
  4. Attempt to restart cqlreplicator with same command

Expected behavior I expect cqlreplicator to restart and continue replication from where it left off.

Screenshots n/a

Additional context This may be related to https://github.com/aws-samples/cql-replicator/issues/112

jlewis-spotnana commented 3 weeks ago

Here's the full '--state run' command

 ./cqlreplicator --state run --region $AWS_REGION \
  --landing-zone s3://bucket-name \
  --tiles 2 --replication-stats-enabled \
  --src-keyspace XXX --src-table YYY \
  --trg-keyspace ZZZ --trg-table YYY \
  --json-mapping '{"keyspaces": { "readBeforeWrite": true}}'
nwheeler81 commented 3 weeks ago

@jlewis-spotnana

  1. go in the S3 folder where the CQLReplicator stores all artifacts,
  2. next find the XXX/YYY/primaryKeys and list all objects.
  3. for example, tile_0.head and tile_0.tail (the number of tile_X.head should be equal to the number of tile_X.tail), if it's not try a simple work around: for example, you have the tile_0.head and missing the tile_0.tail, just make a replicate of the tile_0.head to mimic tile_0.tail and so on.
  4. start the CQLReplicator again
jlewis-spotnana commented 3 weeks ago

Thanks @nwheeler81 . I copied head to tail as suggested, and cqlreplicator is running again!

What is the affect of copying head to tail in this manner? Is there any risk of losing data when doing this?

nwheeler81 commented 3 weeks ago

@jlewis-spotnana there is no risk of losing data, but you will have one idle cycle from the discovery Glue job.