databricks / sbt-databricks

An sbt plugin for deploying code to Databricks Cloud
http://go.databricks.com/register-for-dbc
Other
71 stars 27 forks source link

dbcUpload/Deploy fails with NoSuchElementException: cannot find node with id <id> #42

Open justinmills opened 7 years ago

justinmills commented 7 years ago

I think this is the same issue reported in #35, but I may have narrowed down when it happens.

If I attach a jar to a cluster and then delete that jar the cluster still has the jar attached, but it's in a "deleted pending restart" state. If you attempt to upload or deploy the jar you get this following error:

org.apache.http.client.HttpResponseException: NoSuchElementException: cannot find node with id 377539161864868
    at sbtdatabricks.DatabricksHttp.handleResponse(DatabricksHttp.scala:80)
    at sbtdatabricks.DatabricksHttp.fetchLibraries(DatabricksHttp.scala:132)
    at sbtdatabricks.DatabricksPlugin$$anonfun$dbcFetchLibraries$1.apply(DatabricksPlugin.scala:74)
    at sbtdatabricks.DatabricksPlugin$$anonfun$dbcFetchLibraries$1.apply(DatabricksPlugin.scala:73)
    at scala.Function1$$anonfun$compose$1.apply(Function1.scala:47)
    at sbt.$tilde$greater$$anonfun$$u2219$1.apply(TypeFunctions.scala:40)
    at sbt.std.Transform$$anon$4.work(System.scala:63)
    at sbt.Execute$$anonfun$submit$1$$anonfun$apply$1.apply(Execute.scala:226)
    at sbt.Execute$$anonfun$submit$1$$anonfun$apply$1.apply(Execute.scala:226)
    at sbt.ErrorHandling$.wideConvert(ErrorHandling.scala:17)
    at sbt.Execute.work(Execute.scala:235)
    at sbt.Execute$$anonfun$submit$1.apply(Execute.scala:226)
    at sbt.Execute$$anonfun$submit$1.apply(Execute.scala:226)
    at sbt.ConcurrentRestrictions$$anon$4$$anonfun$1.apply(ConcurrentRestrictions.scala:159)
    at sbt.CompletionService$$anon$2.call(CompletionService.scala:28)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)

It can be fixed by restarting the cluster. I suspect what's going on is that the plugin is trying to find an existing version of the jar, it does, but it's marked as deleted and only exists because there's still a cluster with the jar loaded. Once the last cluster using the jar is restarted, the jar is removed and the plugin no longer finds an existing copy of the jar.

arushijain commented 7 years ago

when we restart a cluster and then deploy again, we get the following error -

sbt dbcDeploy ... [info] Cluster found. Starting deploy process... Deleting older version of test_2.11-1.0.5-SNAPSHOT.jar Uploading test_2.11-1.0.5-SNAPSHOT.jar org.apache.http.client.HttpResponseException: Exception: The directory already contains an element called 'test_2.11-1.0.5-SNAPSHOT.jar' at sbtdatabricks.DatabricksHttp.handleResponse(DatabricksHttp.scala:80) at sbtdatabricks.DatabricksHttp.uploadJar(DatabricksHttp.scala:108) at sbtdatabricks.DatabricksPlugin$$anonfun$8.apply(DatabricksPlugin.scala:151) at sbtdatabricks.DatabricksPlugin$$anonfun$8.apply(DatabricksPlugin.scala:150) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.immutable.HashSet$HashSet1.foreach(HashSet.scala:153) at scala.collection.TraversableLike$class.map(TraversableLike.scala:244) at scala.collection.AbstractSet.scala$collection$SetLike$$super$map(Set.scala:47) at scala.collection.SetLike$class.map(SetLike.scala:93) at scala.collection.AbstractSet.map(Set.scala:47) at sbtdatabricks.DatabricksPlugin$.sbtdatabricks$DatabricksPlugin$$uploadImpl1(DatabricksPlugin.scala:150) at sbtdatabricks.DatabricksPlugin$$anonfun$deployImpl$1$$anonfun$apply$6$$anonfun$apply$7.apply(DatabricksPlugin.scala:194) at sbtdatabricks.DatabricksPlugin$$anonfun$deployImpl$1$$anonfun$apply$6$$anonfun$apply$7.apply(DatabricksPlugin.scala:192) at scala.Function1$$anonfun$compose$1.apply(Function1.scala:47) at sbt.$tilde$greater$$anonfun$$u2219$1.apply(TypeFunctions.scala:40) at sbt.std.Transform$$anon$4.work(System.scala:63) at sbt.Execute$$anonfun$submit$1$$anonfun$apply$1.apply(Execute.scala:228) at sbt.Execute$$anonfun$submit$1$$anonfun$apply$1.apply(Execute.scala:228) at sbt.ErrorHandling$.wideConvert(ErrorHandling.scala:17) at sbt.Execute.work(Execute.scala:237) at sbt.Execute$$anonfun$submit$1.apply(Execute.scala:228) at sbt.Execute$$anonfun$submit$1.apply(Execute.scala:228) at sbt.ConcurrentRestrictions$$anon$4$$anonfun$1.apply(ConcurrentRestrictions.scala:159) at sbt.CompletionService$$anon$2.call(CompletionService.scala:28) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) [error] (*:dbcDeploy) org.apache.http.client.HttpResponseException: Exception: The directory already contains an element called 'test_2.11-1.0.5-SNAPSHOT.jar' [error] Total time: 15 s, completed May 1, 2017 4:50:02 PM

brkyvz commented 7 years ago

@arushijain Does your dbcLibraryPath end with a / by any chance? If so, could you try without it?

arushijain commented 7 years ago

@brkyvz no it doesn't. It is just /Shared/Libraries

justinmills commented 7 years ago

@arushijain I have not seen that error, even after restarting the cluster. Did you get that error after getting the error I mentioned above?

Also, our dbcLibraryPath does not have a trailing /.

arushijain commented 7 years ago

@justinmills So when I deploy to a cluster that already has a jar attached to it, it fails on deleting the previous jar. When I go and check which libraries are used in dbc, I see the following.

screen_shot_2017-05-02_at_12 48 13_pm

Then, I restart the cluster and it has then successfully deleted the jar at which point I can deploy again. Highly unsustainable.