Please provide any logs and a bare minimum reproducible test case, as this will be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached.
Exact command to reproduce: import sagemaker_pyspark
traceback
import sagemaker_pyspark
from pyspark.sql import SparkSession
classpath = ":".join(sagemaker_pyspark.classpath_jars())
spark = SparkSession.builder.config("spark.driver.extraClassPath", classpath).getOrCreate()
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
/tmp/ipykernel_18565/3057191668.py in <cell line: 1>()
----> 1 import sagemaker_pyspark
2 from pyspark.sql import SparkSession
3
4 classpath = ":".join(sagemaker_pyspark.classpath_jars())
5 spark = SparkSession.builder.config("spark.driver.extraClassPath", classpath).getOrCreate()
~/anaconda3/envs/python3/lib/python3.8/site-packages/sagemaker_pyspark/__init__.py in <module>
17 """
18
---> 19 from .wrapper import SageMakerJavaWrapper, Option
20 from .IAMRoleResource import IAMRole, IAMRoleFromConfig
21 from .SageMakerClients import SageMakerClients
~/anaconda3/envs/python3/lib/python3.8/site-packages/sagemaker_pyspark/wrapper.py in <module>
16 from abc import ABCMeta
17
---> 18 from pyspark import SparkContext
19 from pyspark.ml.common import _java2py
20 from pyspark.ml.wrapper import JavaWrapper
~/anaconda3/envs/python3/lib/python3.8/site-packages/pyspark/__init__.py in <module>
49
50 from pyspark.conf import SparkConf
---> 51 from pyspark.context import SparkContext
52 from pyspark.rdd import RDD, RDDBarrier
53 from pyspark.files import SparkFiles
~/anaconda3/envs/python3/lib/python3.8/site-packages/pyspark/context.py in <module>
29 from py4j.protocol import Py4JError
30
---> 31 from pyspark import accumulators
32 from pyspark.accumulators import Accumulator
33 from pyspark.broadcast import Broadcast, BroadcastPickleRegistry
~/anaconda3/envs/python3/lib/python3.8/site-packages/pyspark/accumulators.py in <module>
95 import socketserver as SocketServer
96 import threading
---> 97 from pyspark.serializers import read_int, PickleSerializer
98
99
~/anaconda3/envs/python3/lib/python3.8/site-packages/pyspark/serializers.py in <module>
69 xrange = range
70
---> 71 from pyspark import cloudpickle
72 from pyspark.util import _exception_message
73
~/anaconda3/envs/python3/lib/python3.8/site-packages/pyspark/cloudpickle.py in <module>
143
144
--> 145 _cell_set_template_code = _make_cell_set_template_code()
146
147
~/anaconda3/envs/python3/lib/python3.8/site-packages/pyspark/cloudpickle.py in _make_cell_set_template_code()
124 )
125 else:
--> 126 return types.CodeType(
127 co.co_argcount,
128 co.co_kwonlyargcount,
TypeError: an integer is required (got type bytes)
Please fill out the form below.
System Information
Describe the problem
By following the code snippet from https://github.com/aws/sagemaker-spark/tree/master/sagemaker-pyspark-sdk#local-spark-on-sagemaker-notebook-instances to run local spark on sagemaker notebook instance (platform identifier
notebook-al2-v1
)conda_python3
kernal,import sagemaker_spark
failed. I started another sagemaker notebook instance (platform identifiernotebook-al1-v1
) it works well.Minimal repo / logs
Please provide any logs and a bare minimum reproducible test case, as this will be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached.
import sagemaker_pyspark
traceback