aws / sagemaker-feature-store-spark

Apache License 2.0
6 stars 3 forks source link

[bug] Spark Connector does not work with spark cluster deploy mode #5

Open can-sun opened 1 year ago

can-sun commented 1 year ago

How to reproduce?

  1. Create a EMR cluster and SSH into the host
  2. Install the spark connector as described in documentation
  3. Prepare a python script to ingest the data to a feature group using spark connector
  4. Use the spark-submit and execute in cluster mode. E.g spark-submit --deploy-mode cluster s3://spark-sucan-test-demo/BatchIngestionTest.py

Result

ERROR ApplicationMaster: Uncaught exception:
org.apache.spark.SparkException: Exception thrown in awaitResult:
    at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:301)
    at org.apache.spark.deploy.yarn.ApplicationMaster.runDriver(ApplicationMaster.scala:507)
    at org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:271)
    at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$3.run(ApplicationMaster.scala:902)
    at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$3.run(ApplicationMaster.scala:901)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
    at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:901)
    at org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala)
Caused by: org.apache.spark.SparkUserAppException: User application exited with 1
    at org.apache.spark.deploy.PythonRunner$.main(PythonRunner.scala:111)
    at org.apache.spark.deploy.PythonRunner.main(PythonRunner.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:735)

Expected Behavior

The script should be executed successfully without any failures.