getsentry / sentry-python

The official Python SDK for Sentry.io
https://sentry.io/for/python/
MIT License
1.93k stars 510 forks source link

Pyspark Driver Integration errors out with py4j.Py4JException: Method attemptId([]) does not exist #1099

Open amCap1712 opened 3 years ago

amCap1712 commented 3 years ago

Environment

How do you use Sentry? Self-hosted - 9.1.2

Which SDK and version? sentry-sdk[pyspark] == 0.20.3

Steps to Reproduce

I setup the Pyspark Integration as described in the official docs. I have only added the Driver integration currently. As I have not added the worker integration, I am also not adding the daemon configuration to the spark-submit script.

Expected Result

Sentry correctly captures and reports the errors.

Actual Result

The log is filled with errors. The crux of the error seems to be py4j.Py4JException: Method attemptId([]) does not exist. I have attached two logs here https://gist.github.com/amCap1712/6000892a940b7c004dad28060ddfd90d . One is when running on Spark 2.4.5 and other with Spark 3.1.1. Also, sentry captures this error which seems to occur while its connecting the integration and reports it.

I'll be happy to assist as much as I can to debug and solve this issue.

pvanderlinden commented 3 years ago

Related to #1102

dinesh-712 commented 2 years ago

@amCap1712 / @pvanderlinden Is there a workaround for this issue apart from filtering out this exception?

pvanderlinden commented 2 years ago

@dinesh-712 I ended up not using the pyspark specific integration, only the normal Python integration.

dinesh-712 commented 2 years ago

@pvanderlinden Thanks for the reply. Does using normal python integration guarantee the capture of errors in all worker nodes(slaves) created in SparkContext?

pvanderlinden commented 2 years ago

It will only capture exception which reach the driving script. But I don't think the integration is functional at the moment

antonpirker commented 1 year ago

Currently not a priority. I will close this. If there is demand for this, please reopen.

serglom21 commented 2 weeks ago

Hey @antonpirker! I'm wondering if priorities have shifted and whether there is a place for this bug to be addressed?

It seems that the cause of this error comes from having this method call. It seems that this could get fixed by modifying the code to call stage_info.attempt instead

sentrivana commented 2 weeks ago

No changes in prio unfortunately. I'll keep this open though -- someone might find some time during our maintenance windows.

It might be that this really just needs one method call fix, but we'll need to find some time to verify that. (PRs are always welcome.)

What I'm wondering about is why the pyspark test suite that we reactivated some time ago is successful -- since this seems like a pretty fundamental issue I'd expect it to surface there. Ideally we'd also add a test case to capture this.