aws / sagemaker-spark

A Spark library for Amazon SageMaker.
https://aws.github.io/sagemaker-spark/
Apache License 2.0
300 stars 128 forks source link

Newer versions of the library on maven central #96

Open jayantshekhar opened 5 years ago

jayantshekhar commented 5 years ago

System Information

Describe the problem

EMR clusters which use spark 2.3 and later have newer versions of sagemaker spark jars.

However they are not available on maven central : https://mvnrepository.com/artifact/com.amazonaws/sagemaker-spark

When is the plan to release to maven central for spark 2.3 and later? Or any recommendations for running on later EMR versions of the cluster.

Minimal repo / logs

laurenyu commented 5 years ago

Unfortunately, we don't have any plans to upgrade the current Spark version, but we are always re-evaluating our roadmap based on customer feedback!

jayantshekhar commented 5 years ago

Thanks for that Lauren!

Trying to understand it. Is Spark-SageMaker on the roadmap and would you recommend users to continue building solutions on it?

Is there something else you would like us to go with when integrating with SageMaker especially when running jobs on EMR?

nadiaya commented 5 years ago

Please refer to our documentation for Spark support. https://docs.aws.amazon.com/sagemaker/latest/dg/apache-spark.html We are evaluating our roadmap and will add support for the latest version in the future.

jayantshekhar commented 4 years ago

Thanks a lot Nadia! Will keep an eye on it and look forward to support for Spark 2.3 and Spark 2.4.

ehameyie commented 4 years ago

I had issues running sagemaker_pyspark on EMR 5.22 per this closed issue. I was able to have it work with no issue and confirm this with an AWS tech support. The changes I had to apply are listed in my comments in the closed issue linked above. Figured I'd also post here in case it can benefit anyone else.

One question though. It appears that sagemaker_pyspark SDK is not updated as often as sagemaker python SDK. Should we not be concerned because sagemaker_pyspark is a wrapper for sagemaker python SDK; or is it indeed lower priority in your roadmap and therefore receives less support?