I've encountered an issue when using SparkIntegration with my PySpark application. I was following the guide specified at Spark Driver Integration Documentation and experienced the following AttributeError:
sc._jsc.sc().addSparkListener(listener)
E AttributeError: 'SparkContext' object has no attribute '_jsc'
Consequently, _start_sentry_listener and _set_app_properties referenced at spark_driver.py#L62-L63 should ideally be invoked after spark_context_init is executed.
I have tested this modification using both local and yarn Spark masters, and fixed version in my repo appears to be functioning correctly.
this is my test code
def test_initialize_spark_integration(sentry_init):
# fail with the code: https://github.com/getsentry/sentry-python/blob/2.5.1/sentry_sdk/integrations/spark/spark_driver.py#L53
# success with the code: https://github.com/seyoon-lim/sentry-python/blob/fix-spark-driver-integration/sentry_sdk/integrations/spark/spark_driver.py#L53
sentry_init(integrations=[SparkIntegration()])
SparkContext.getOrCreate()
Looking forward to your feedback and suggestions for addressing this issue.
Thank you!
Expected Result
from pyspark.sql import SparkSession
import sentry_sdk
from sentry_sdk.integrations.spark import SparkIntegration
if __name__ == "__main__":
sentry_sdk.init(
dsn=matrix_dsn,
integrations=[SparkIntegration()],
)
spark = SparkSession.builder.getOrCreate()
...
Actual Result
Traceback (most recent call last):
File "/Users/kakao/Desktop/shaun/workplace/my-repos/du-batch/entrypoint.py", line 17, in <module>
spark = SparkSession.builder.getOrCreate()
File "/Users/kakao/Desktop/shaun/workplace/my-repos/du-batch/venv/lib/python3.9/site-packages/pyspark/sql/session.py", line 477, in getOrCreate
sc = SparkContext.getOrCreate(sparkConf)
File "/Users/kakao/Desktop/shaun/workplace/my-repos/du-batch/venv/lib/python3.9/site-packages/pyspark/context.py", line 514, in getOrCreate
SparkContext(conf=conf or SparkConf())
File "/Users/kakao/Desktop/shaun/workplace/my-repos/du-batch/venv/lib/python3.9/site-packages/pyspark/context.py", line 201, in __init__
self._do_init(
File "/Users/kakao/Desktop/shaun/workplace/my-repos/du-batch/venv/lib/python3.9/site-packages/sentry_sdk/utils.py", line 1710, in runner
return sentry_patched_function(*args, **kwargs)
File "/Users/kakao/Desktop/shaun/workplace/my-repos/du-batch/venv/lib/python3.9/site-packages/sentry_sdk/integrations/spark/spark_driver.py", line 69, in _sentry_patched_spark_context_init
_start_sentry_listener(self)
File "/Users/kakao/Desktop/shaun/workplace/my-repos/du-batch/venv/lib/python3.9/site-packages/sentry_sdk/integrations/spark/spark_driver.py", line 55, in _start_sentry_listener
sc._jsc.sc().addSparkListener(listener)
AttributeError: 'SparkContext' object has no attribute '_jsc'
How do you use Sentry?
Sentry Saas (sentry.io)
Version
2.5.1
Steps to Reproduce
Hello,
I've encountered an issue when using SparkIntegration with my PySpark application. I was following the guide specified at Spark Driver Integration Documentation and experienced the following AttributeError:
Upon investigating, it seems that the issue may stem from the code at sentry-python/spark_driver.py#L50. The sc._jsc attribute is set after the SparkContext is initialized, as seen in apache/spark/pyspark/context.py#L296.
Consequently,
_start_sentry_listener
and_set_app_properties
referenced at spark_driver.py#L62-L63 should ideally be invoked after spark_context_init is executed.I have tested this modification using both local and yarn Spark masters, and fixed version in my repo appears to be functioning correctly.
this is my test code
Looking forward to your feedback and suggestions for addressing this issue.
Thank you!
Expected Result
Actual Result