Closed Griesbacher closed 1 year ago
I've got no idea why is that happening exactly but it seems to have something to do with the scala-compiler minor version mismatch (2.12.7 vs 2.12.15 vs 2.12.17) Theoretically it shouldn't matter, but we aren't living in the ideal world.
We use runtime compilation to overcome API discrepancy between different Json4s and Jackson versions used in different Spark distributions. I guess I'll try to cut the dependency on JSON serde library used in Spark and with that get rid of runtime scala compilation and dependency on scala compiler.
@Griesbacher can you test this build please and see if it fixes the issue? https://teamcity.jetbrains.com/repository/download/OpenSourceProjects_AbsaOSS_SplineAgentSpark_AutoBuildArtifactsSparkScala212/4080044:id/bundle-3.3/target/spark-3.3-spline-agent-bundle_2.12-1.1.0-SNAPSHOT.jar
@wajda this version looks good! The execution plan/event are printed in the logs. I just have to adapt my plugin to the new version, because I use za.co.absa.spline.harvester.json.HarvesterJsonSerDe.impl.EntityToJson
which seems to be missing now: java.lang.NoSuchMethodError: za.co.absa.spline.harvester.json.HarvesterJsonSerDe$.impl()Lza/co/absa/commons/json/AbstractJsonSerDe;
. But I'll adapt the plugin once version 1.1.0 is available. Thank you very much!
Yes, removing it was the fix :)
If in your plugin you have just import HarversterJsonSerDe.impl._
then it's enough to simply recompile your code. The change was done inside the impl
itself, it's now object, not a field, but the usage should remain the same.
@Griesbacher Hey there. Can you be so kind as to provide me the JAR that helped fixed this issue? I'm facing a similar issue, and would like to build around Spline until the future release with this fix is available.
It's fixed in version 1.1.0 that will be released in the next day or two.
Hi,
thanks for the great work!
I'm trying to run the latest agent (spark-3.3-spline-agent-bundle_2.12-1.0.4) with Glue 4.0 on AWS and a pyspark script, which fails in an exception. I was hoping maybe you could give me a hint what to look out for.
I've tested my code with the agent against "vanilla" Spark 3.3.* which works perfectly fine - obviously it cannot be exactly the same code because it is not running as an AWS managed service. Glue 4.0 should use Spark 3.3.0 as stated here: https://docs.aws.amazon.com/glue/latest/dg/release-notes.html
The same code is also working fine with Glue 3 (spark-3.1-spline-agent-bundle_2.12-1.0.4) and Glue 2 (spark-2.2-spline-agent-bundle_2.11-1.0.4).
I tried to create a minimal example for the problem:
Pyspark script:
Glue job config:
Log / Error:
The Glue job finishes successfully only the agent fails.
I've noticed that Glue 4 is missing the snake yaml library (https://mvnrepository.com/artifact/org.yaml/snakeyaml/1.33) that's why I added it manually to the path.
I thought the error might be related to https://github.com/AbsaOSS/spline-spark-agent/issues/382. Therefore I checked the Scala version Glue 4 is using. To do so I've added a plugin as descibed here: https://github.com/AbsaOSS/spline-spark-agent#plugin-api, which prints
scala.util.Properties.versionNumberString
->2.12.7
, which should be okay?! Just to double check I also recompiled the agent with 2.12.7, but the result was the same error.I also tested the versions spark-3.3-spline-agent-bundle_2.12-0.7.10 and spark-3.3-spline-agent-bundle_2.12-0.7.13, but it resulted in the same error.
But now I'm out of ideas... Could you point me towards something I could check out?
Best regards, Philip