aws / sagemaker-spark

A Spark library for Amazon SageMaker.
https://aws.github.io/sagemaker-spark/
Apache License 2.0
300 stars 128 forks source link

Update Spark to 2.4.2 #136

Closed e13h closed 3 years ago

e13h commented 3 years ago

Delta Lake 0.6.1 requires Spark 2.4.2 or higher.

Issue #, if available:

Description of changes: Update Spark from 2.4.0 to 2.4.2

Merge Checklist

Put an x in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your pull request.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

sagemaker-bot commented 3 years ago

AWS CodeBuild CI Report

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

e13h commented 3 years ago

@icywang86rui you did some great work in #135 debugging the build logs so that we could get Spark to v2.4.0. I'm interested in seeing Spark updated to v2.4.2 so that I can use Delta Lake. I picked through the logs, and it seems that most (if not all) the tests failed because of S3DataPath. Could you help point me in the right direction?

I found this particular stack trace many times in the logs:

E                   py4j.protocol.Py4JJavaError: An error occurred while calling None.com.amazonaws.services.sagemaker.sparksdk.S3DataPath.
E                   : java.lang.NoClassDefFoundError: scala/Product$class
E                       at com.amazonaws.services.sagemaker.sparksdk.S3DataPath.<init>(S3Resource.scala:50)
E                       at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
E                       at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
E                       at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
E                       at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
E                       at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
E                       at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
E                       at py4j.Gateway.invoke(Gateway.java:238)
E                       at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
E                       at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
E                       at py4j.GatewayConnection.run(GatewayConnection.java:238)
E                       at java.lang.Thread.run(Thread.java:748)