AbsaOSS / spline-spark-agent

Spline agent for Apache Spark
https://absaoss.github.io/spline/
Apache License 2.0
185 stars 95 forks source link

Cant connect to spark using 2.0.0 on Databricks 12.2 LTS (includes Apache Spark 3.3.2, Scala 2.12) #744

Closed ghost closed 1 year ago

ghost commented 1 year ago

I use the spark agent to capture lineage on Databricks. I updated to v2.0.0 and am encountering an error on Databricks runtime 12.2; it works fine on 11.3. During cluster initialization, the endpoint fails to connect to the spark master. Trace:

23/09/19 23:33:09 ERROR DriverDaemon$: XXX Fatal uncaught exception. Terminating driver.
java.io.IOException: Failed to bind to 0.0.0.0/0.0.0.0:6062
    at org.eclipse.jetty.server.ServerConnector.openAcceptChannel(ServerConnector.java:349)
    at org.eclipse.jetty.server.ServerConnector.open(ServerConnector.java:310)
    at org.eclipse.jetty.server.AbstractNetworkConnector.doStart(AbstractNetworkConnector.java:80)
    at org.eclipse.jetty.server.ServerConnector.doStart(ServerConnector.java:234)
    at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:73)
    at org.eclipse.jetty.server.Server.doStart(Server.java:401)
    at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:73)
    at com.databricks.backend.daemon.driver.CommChannelServer.start(CommChannelServer.scala:15)
    at com.databricks.backend.daemon.driver.DriverDaemon.$anonfun$start$3(DriverDaemon.scala:160)
    at com.databricks.backend.daemon.driver.DriverDaemon.$anonfun$start$3$adapted(DriverDaemon.scala:159)
    at scala.Option.foreach(Option.scala:407)
    at com.databricks.backend.daemon.driver.DriverDaemon.start(DriverDaemon.scala:159)
    at com.databricks.backend.daemon.driver.DriverDaemon$.wrappedMain(DriverDaemon.scala:986)
    at com.databricks.DatabricksMain.$anonfun$main$1(DatabricksMain.scala:147)
    at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
    at com.databricks.DatabricksMain.$anonfun$withStartupProfilingData$1(DatabricksMain.scala:450)
    at com.databricks.logging.UsageLogging.$anonfun$recordOperation$1(UsageLogging.scala:555)
    at com.databricks.logging.UsageLogging.executeThunkAndCaptureResultTags$1(UsageLogging.scala:650)
    at com.databricks.logging.UsageLogging.$anonfun$recordOperationWithResultTags$4(UsageLogging.scala:671)
    at com.databricks.logging.UsageLogging.$anonfun$withAttributionContext$1(UsageLogging.scala:412)
    at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62)
    at com.databricks.logging.AttributionContext$.withValue(AttributionContext.scala:158)
    at com.databricks.logging.UsageLogging.withAttributionContext(UsageLogging.scala:410)
    at com.databricks.logging.UsageLogging.withAttributionContext$(UsageLogging.scala:407)
    at com.databricks.DatabricksMain.withAttributionContext(DatabricksMain.scala:89)
    at com.databricks.logging.UsageLogging.withAttributionTags(UsageLogging.scala:455)
    at com.databricks.logging.UsageLogging.withAttributionTags$(UsageLogging.scala:440)
    at com.databricks.DatabricksMain.withAttributionTags(DatabricksMain.scala:89)
    at com.databricks.logging.UsageLogging.recordOperationWithResultTags(UsageLogging.scala:645)
    at com.databricks.logging.UsageLogging.recordOperationWithResultTags$(UsageLogging.scala:564)
    at com.databricks.DatabricksMain.recordOperationWithResultTags(DatabricksMain.scala:89)
    at com.databricks.logging.UsageLogging.recordOperation(UsageLogging.scala:555)
    at com.databricks.logging.UsageLogging.recordOperation$(UsageLogging.scala:525)
    at com.databricks.DatabricksMain.recordOperation(DatabricksMain.scala:89)
    at com.databricks.DatabricksMain.withStartupProfilingData(DatabricksMain.scala:450)
    at com.databricks.DatabricksMain.main(DatabricksMain.scala:146)
    at com.databricks.backend.daemon.driver.DriverDaemon.main(DriverDaemon.scala)
Caused by: java.net.BindException: Address already in use
    at sun.nio.ch.Net.bind0(Native Method)
    at sun.nio.ch.Net.bind(Net.java:461)
    at sun.nio.ch.Net.bind(Net.java:453)
    at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:222)
    at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:85)
    at org.eclipse.jetty.server.ServerConnector.openAcceptChannel(ServerConnector.java:344)
    ... 36 more
afgaron commented 1 year ago

We think this is related to the error we received when upgrading to 1.0.0 and DBR 11 (issue 587). We're looking to support DBR 13 for our use case once the the spline bundle for Spark 3.4 is ready, and are testing upgrading the rest of the process to 2.0.0 in preparation.

ghost commented 1 year ago

It was an internal DBR issue, solved by adding this property: spark.databricks.python.defaultPythonRepl pythonshell