Describe the bug
Cannot execute tutorials/basic_example.ipynb from within the SageMaker Studio deployed in a VPC which has the Internet access. VPC endpoints to S3, SageMaker API, Runtime, CloudWatch log have been created in the subnet where the Studio ENI exists.
Please set env variable SPARK_VERSION
---------------------------------------------------------------------------
Exception Traceback (most recent call last)
<ipython-input-2-76077136385f> in <module>
10 .config("spark.driver.extraClassPath", classpath)
11 .config("spark.jars.packages", pydeequ.deequ_maven_coord)
---> 12 .config("spark.jars.excludes", pydeequ.f2j_maven_coord)
13 .getOrCreate())
/opt/conda/lib/python3.7/site-packages/pyspark/sql/session.py in getOrCreate(self)
226 sparkConf.set(key, value)
227 # This SparkContext may be an existing one.
--> 228 sc = SparkContext.getOrCreate(sparkConf)
229 # Do not update `SparkConf` for existing `SparkContext`, as it's shared
230 # by all sessions.
/opt/conda/lib/python3.7/site-packages/pyspark/context.py in getOrCreate(cls, conf)
382 with SparkContext._lock:
383 if SparkContext._active_spark_context is None:
--> 384 SparkContext(conf=conf or SparkConf())
385 return SparkContext._active_spark_context
386
/opt/conda/lib/python3.7/site-packages/pyspark/context.py in __init__(self, master, appName, sparkHome, pyFiles, environment, batchSize, serializer, conf, gateway, jsc, profiler_cls)
142 " is not allowed as it is a security risk.")
143
--> 144 SparkContext._ensure_initialized(self, gateway=gateway, conf=conf)
145 try:
146 self._do_init(master, appName, sparkHome, pyFiles, environment, batchSize, serializer,
/opt/conda/lib/python3.7/site-packages/pyspark/context.py in _ensure_initialized(cls, instance, gateway, conf)
329 with SparkContext._lock:
330 if not SparkContext._gateway:
--> 331 SparkContext._gateway = gateway or launch_gateway(conf)
332 SparkContext._jvm = SparkContext._gateway.jvm
333
/opt/conda/lib/python3.7/site-packages/pyspark/java_gateway.py in launch_gateway(conf, popen_kwargs)
106
107 if not os.path.isfile(conn_info_file):
--> 108 raise Exception("Java gateway process exited before sending its port number")
109
110 with open(conn_info_file, "rb") as info:
Exception: Java gateway process exited before sending its port number
To Reproduce
Open the tutorials/basic_example.ipynb in the SageMaker studio.
Run all.
Expected behavior
Run without errors.
Screenshots
NA
Desktop (please complete the following information):
SageMaker Studio in the us-east-2 region. Python 3 Data Science kernel.
Question
Please be specific with the system requirements to be able to run the tutorial notebooks.
Do they work in a SageMaker Studio in VPC?
Is a EMR or a Spark cluster provision required? If yes, what are the configurations required?
Are any other configurations, environment variable settings required?
Describe the bug Cannot execute tutorials/basic_example.ipynb from within the SageMaker Studio deployed in a VPC which has the Internet access. VPC endpoints to S3, SageMaker API, Runtime, CloudWatch log have been created in the subnet where the Studio ENI exists.
Result:
To Reproduce
Expected behavior Run without errors.
Screenshots NA
Desktop (please complete the following information): SageMaker Studio in the us-east-2 region. Python 3 Data Science kernel.
Question
Please be specific with the system requirements to be able to run the tutorial notebooks.