aws-samples / cql-replicator

CQLReplicator is a migration tool that helps you to replicate data from Cassandra to AWS Services
Apache License 2.0
16 stars 8 forks source link

[CQLReplicator on Glue] Error in the log - PROCESS_TYPE: replication #106

Closed frozensky closed 9 months ago

frozensky commented 9 months ago

I am seeing random error in the replication log



2024-02-05 22:02:11,006 INFO [main] log.GlueLogger (GlueLogger.scala:info(8)): Historical data load.Processing locations: ParVector((tile_0.head,0,head))
--
2024-02-05 22:02:11,013 INFO [scala-execution-context-global-88] glue.GlueContext (GlueContext.scala:getSecretOptionsFromSecretManager(970)): Glue secret manager integration: secretId is not provided.

<br class="Apple-interchange-newline">

java.lang.NoSuchMethodError: org.json4s.CustomSerializer.<init>(Lscala/Function1;Lscala/reflect/Manifest;)V
--
at com.amazonaws.services.glue.util.StringToBoolean$.<init>(JsonOptions.scala:124)  at com.amazonaws.services.glue.util.StringToBoolean$.<clinit>(JsonOptions.scala)    at com.amazonaws.services.glue.util.JsonOptions$.apply(JsonOptions.scala:108)   at com.amazonaws.services.glue.GlueContext.getOptionsWithCredentialFromSecretManagerForLegacy(GlueContext.scala:948)    at com.amazonaws.services.glue.GlueContext.getSourceInternal(GlueContext.scala:980) at com.amazonaws.services.glue.GlueContext.getSource(GlueContext.scala:897) at com.amazonaws.services.glue.GlueContext.getSourceWithFormat(GlueContext.scala:1377)  at GlueApp$.$anonfun$main$41(CQLReplicator.scala:569)   at GlueApp$.$anonfun$main$41$adapted(CQLReplicator.scala:565)   at scala.collection.Iterator.foreach(Iterator.scala:943)    at scala.collection.Iterator.foreach$(Iterator.scala:943)   at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)   at scala.collection.parallel.ParIterableLike$Foreach.leaf(ParIterableLike.scala:982)    at scala.collection.parallel.Task.$anonfun$tryLeaf$1(Tasks.scala:53)    at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)   at scala.util.control.Breaks$$anon$1.catchBreak(Breaks.scala:67)    at scala.collection.parallel.Task.tryLeaf(Tasks.scala:56)
at scala.collection.parallel.Task.tryLeaf$(Tasks.scala:50)
at scala.collection.parallel.ParIterableLike$Foreach.tryLeaf(ParIterableLike.scala:979) at scala.collection.parallel.AdaptiveWorkStealingTasks$WrappedTask.compute(Tasks.scala:153) at scala.collection.parallel.AdaptiveWorkStealingTasks$WrappedTask.compute$(Tasks.scala:149)
at scala.collection.parallel.AdaptiveWorkStealingForkJoinTasks$WrappedTask.compute(Tasks.scala:440) at java.util.concurrent.RecursiveAction.exec(RecursiveAction.java:189)  at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)  at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)  at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)  at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:175)

java.lang.NoClassDefFoundError: Could not initialize class com.amazonaws.services.glue.util.StringToBoolean$
at com.amazonaws.services.glue.util.JsonOptions$.apply(JsonOptions.scala:108)   at com.amazonaws.services.glue.util.JsonOptions.append(JsonOptions.scala:91)
at com.amazonaws.services.glue.GlueContext.getOptionsWithCredentialFromSecretManagerForLegacy(GlueContext.scala:948)    at com.amazonaws.services.glue.GlueContext.getSourceInternal(GlueContext.scala:980) at com.amazonaws.services.glue.GlueContext.getSource(GlueContext.scala:897)
at com.amazonaws.services.glue.GlueContext.getSourceWithFormat(GlueContext.scala:1377)
at GlueApp$.$anonfun$main$41(CQLReplicator.scala:569)
at GlueApp$.$anonfun$main$41$adapted(CQLReplicator.scala:565)
at scala.collection.Iterator.foreach(Iterator.scala:943)
at scala.collection.Iterator.foreach$(Iterator.scala:943)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
at scala.collection.parallel.ParIterableLike$Foreach.leaf(ParIterableLike.scala:982)
at scala.collection.parallel.Task.$anonfun$tryLeaf$1(Tasks.scala:53)
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at scala.util.control.Breaks$$anon$1.catchBreak(Breaks.scala:67)
at scala.collection.parallel.Task.tryLeaf(Tasks.scala:56)
at scala.collection.parallel.Task.tryLeaf$(Tasks.scala:50)
at scala.collection.parallel.ParIterableLike$Foreach.tryLeaf(ParIterableLike.scala:979)
at scala.collection.parallel.AdaptiveWorkStealingTasks$WrappedTask.compute(Tasks.scala:153)
at scala.collection.parallel.AdaptiveWorkStealingTasks$WrappedTask.compute$(Tasks.scala:149)
at scala.collection.parallel.AdaptiveWorkStealingForkJoinTasks$WrappedTask.compute(Tasks.scala:440)
at java.util.concurrent.RecursiveAction.exec(RecursiveAction.java:189)
at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:175)
nwheeler81 commented 9 months ago

@frozensky I will look there but it's not affecting the following line. val sourceDf = glueContext.getSourceWithFormat( connectionType = "s3", format = "parquet", options = JsonOptions(s"""{"paths": ["$sourcePath"]}""")).getDynamicFrame().toDF()

nwheeler81 commented 9 months ago

@frozensky it doesn't affect the replication process, but looks like the Glue internals com.amazonaws.services.glue.util.StringToBoolean

nwheeler81 commented 9 months ago

@frozensky in order fix the issue:

  1. Open Job Details on the CQLReplicator job
  2. Find Advanced properties -> Edit Dependent JARs path by removing S3 paths: s3://your-bucket/artifacts/opensearch-spark-30_2.12-1.0.1.jar and s3://your-bucket/artifacts/jedis-4.4.6.jar files.