aws / sagemaker-spark

A Spark library for Amazon SageMaker.
https://aws.github.io/sagemaker-spark/
Apache License 2.0
300 stars 128 forks source link

`import sagemaker_spark` failed on sagemaker notebook instance (platform identifier `notebook-al2-v1`) #144

Open ohfloydo opened 2 years ago

ohfloydo commented 2 years ago

Please fill out the form below.

System Information

Describe the problem

By following the code snippet from https://github.com/aws/sagemaker-spark/tree/master/sagemaker-pyspark-sdk#local-spark-on-sagemaker-notebook-instances to run local spark on sagemaker notebook instance (platform identifier notebook-al2-v1) conda_python3 kernal, import sagemaker_spark failed. I started another sagemaker notebook instance (platform identifier notebook-al1-v1) it works well.

Minimal repo / logs

Please provide any logs and a bare minimum reproducible test case, as this will be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached.

traceback

import sagemaker_pyspark

from pyspark.sql import SparkSession

​

classpath = ":".join(sagemaker_pyspark.classpath_jars())

spark = SparkSession.builder.config("spark.driver.extraClassPath", classpath).getOrCreate()

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
/tmp/ipykernel_18565/3057191668.py in <cell line: 1>()
----> 1 import sagemaker_pyspark
      2 from pyspark.sql import SparkSession
      3 
      4 classpath = ":".join(sagemaker_pyspark.classpath_jars())
      5 spark = SparkSession.builder.config("spark.driver.extraClassPath", classpath).getOrCreate()

~/anaconda3/envs/python3/lib/python3.8/site-packages/sagemaker_pyspark/__init__.py in <module>
     17 """
     18 
---> 19 from .wrapper import SageMakerJavaWrapper, Option
     20 from .IAMRoleResource import IAMRole, IAMRoleFromConfig
     21 from .SageMakerClients import SageMakerClients

~/anaconda3/envs/python3/lib/python3.8/site-packages/sagemaker_pyspark/wrapper.py in <module>
     16 from abc import ABCMeta
     17 
---> 18 from pyspark import SparkContext
     19 from pyspark.ml.common import _java2py
     20 from pyspark.ml.wrapper import JavaWrapper

~/anaconda3/envs/python3/lib/python3.8/site-packages/pyspark/__init__.py in <module>
     49 
     50 from pyspark.conf import SparkConf
---> 51 from pyspark.context import SparkContext
     52 from pyspark.rdd import RDD, RDDBarrier
     53 from pyspark.files import SparkFiles

~/anaconda3/envs/python3/lib/python3.8/site-packages/pyspark/context.py in <module>
     29 from py4j.protocol import Py4JError
     30 
---> 31 from pyspark import accumulators
     32 from pyspark.accumulators import Accumulator
     33 from pyspark.broadcast import Broadcast, BroadcastPickleRegistry

~/anaconda3/envs/python3/lib/python3.8/site-packages/pyspark/accumulators.py in <module>
     95     import socketserver as SocketServer
     96 import threading
---> 97 from pyspark.serializers import read_int, PickleSerializer
     98 
     99 

~/anaconda3/envs/python3/lib/python3.8/site-packages/pyspark/serializers.py in <module>
     69     xrange = range
     70 
---> 71 from pyspark import cloudpickle
     72 from pyspark.util import _exception_message
     73 

~/anaconda3/envs/python3/lib/python3.8/site-packages/pyspark/cloudpickle.py in <module>
    143 
    144 
--> 145 _cell_set_template_code = _make_cell_set_template_code()
    146 
    147 

~/anaconda3/envs/python3/lib/python3.8/site-packages/pyspark/cloudpickle.py in _make_cell_set_template_code()
    124         )
    125     else:
--> 126         return types.CodeType(
    127             co.co_argcount,
    128             co.co_kwonlyargcount,

TypeError: an integer is required (got type bytes)