Open mobuchowski opened 2 years ago
I ran some more experiments, this time with a fake host and on OpenLineage 0.9.0, and was not able to reproduce the issue with regards to the port; instead, the new experiments show that Spark 3.2 looks to be involved.
On Spark 3.2.1 / Databricks 10.4 LTS: Using (fake) host http://ac7aca38330144df9.amazonaws.com:5000 crashes when the first notebook cell is evaluated with The spark context has stopped and the driver is restarting
.
The same occurs when the port is removed.
On Spark 3.1.2 / Databricks 9.1 LTS: Using (fake) host http://ac7aca38330144df9.amazonaws.com:5000 does not impede the cluster but, reasonably, produces for each lineage event ERROR EventEmitter: Could not emit lineage w/ exception io.openlineage.client.OpenLineageClientException: java.net.UnknownHostException
The same occurs when the port is removed.
The "Azure Databricks - OpenLineage - Microsoft Purview" integration suffers from the same issue; theirs is tracked here: https://github.com/microsoft/Purview-ADB-Lineage-Solution-Accelerator/issues/14
Issue reported on Slack: https://openlineage.slack.com/archives/C01CK9T7HKR/p1653624210694359?thread_ts=1651498343.959749&cid=C01CK9T7HKR
Following up on this as I encounter the same issue with the Openlineage Databricks integration. This issue seems quite malicious as it crashes the Spark Context and requires a restart. I have marquez running on AWS EKS; I’m using Openlineage 0.8.2 on Databricks 10.4 (Spark 3.2.1) and my Spark config looks like this:
I can run some simple read and write commands and successfully find the log4j events highlighted in the docs:
After doing this a few times I get The spark context has stopped and the driver is restarting. Your notebook will be automatically reattached. stderr shows a bunch of things. log4j shows the same as for Kostikey: ERROR EventEmitter: [...] Unable to serialize logical plan due to: Infinite recursion (StackOverflowError) I have one more piece of information which I can’t make much sense of, but hopefully someone else can; if I include the port in the host, I can very reliably crash the Spark Context on the first attempt. So:
Any insights would be greatly appreciated!