EclairJS / eclairjs

Main EclairJS Repository
Apache License 2.0
130 stars 26 forks source link

Timeout waiting for kernel_info reply #20

Open QuantumRich opened 7 years ago

QuantumRich commented 7 years ago

I seem to be getting this issue from Jupyter when connecting now. It was not a problem on Friday, but now I consistently I get this error. Perhaps it may have to do with 'Sessions API changed, see updated swagger docs'?

[I 14:02:43.668 NotebookApp] 0 active kernels 
[I 14:02:43.668 NotebookApp] The Jupyter Notebook is running at: http://localhost:8888/
[I 14:02:43.668 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
[W 14:02:47.222 NotebookApp] Sessions API changed, see updated swagger docs
[I 14:02:47.244 NotebookApp] Kernel started: eb48b34a-824e-42d7-96b6-931b23e438a7
Starting Spark Kernel with SPARK_HOME=/usr/local/spark
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel).
17/01/17 14:02:48 [INFO] o.a.t.Main$$anon$1 - Kernel version: 0.2.0.dev1-incubating-SNAPSHOT
17/01/17 14:02:48 [INFO] o.a.t.Main$$anon$1 - Scala version: Some(2.11.8)
17/01/17 14:02:48 [INFO] o.a.t.Main$$anon$1 - ZeroMQ (JeroMQ) version: 3.2.5
17/01/17 14:02:48 [INFO] o.a.t.Main$$anon$1 - Initializing internal actor system
17/01/17 14:02:48 [INFO] o.a.t.Main$$anon$1 - Connection Profile: {
  "stdin_port" : 50690,
  "control_port" : 50691,
  "hb_port" : 50692,
  "shell_port" : 50688,
  "iopub_port" : 50689,
  "ip" : "127.0.0.1",
  "transport" : "tcp",
  "signature_scheme" : "hmac-sha256",
  "key" : "8ff89d12-6b53-4bb3-a6fc-0c24c31e5b3f"
}
(SparkR,org.apache.toree.kernel.interpreter.sparkr.SparkRInterpreter@5a8816cc)
(Scala,org.apache.toree.kernel.interpreter.scala.ScalaInterpreter@68565bc7)
(PySpark,org.apache.toree.kernel.interpreter.pyspark.PySparkInterpreter@37baddde)
(eclair,org.eclairjs.nashorn.JavascriptInterpreter@5115f590)
(SQL,org.apache.toree.kernel.interpreter.sql.SqlInterpreter@4b31a708)
17/01/17 14:02:50 [WARN] o.a.t.Main$$anon$1 - No external magics provided to PluginManager!
17/01/17 14:02:52 [INFO] o.a.t.Main$$anon$1 - 13 internal plugins loaded
17/01/17 14:02:52 [INFO] o.a.t.Main$$anon$1 - 0 external plugins loaded
17/01/17 14:02:54 [WARN] o.a.t.k.i.s.ScalaInterpreter - kernel variable: org.apache.toree.boot.layer.StandardComponentInitialization$$anon$1@4560f8dc
17/01/17 14:02:54 [WARN] o.a.t.k.i.s.ScalaInterpreter - Binding List(@transient implicit) kernel org.apache.toree.kernel.api.Kernel org.apache.toree.boot.layer.StandardComponentInitialization$$anon$1@4560f8dc
17/01/17 14:02:55 [INFO] o.a.t.k.i.s.ScalaInterpreter - Binding SparkContext into interpreter as sc
Start kernel init
END kernel init
17/01/17 14:02:55 [INFO] o.a.t.Main$$anon$1 - Marking relay as ready for receiving messages
17/01/17 14:02:55 [INFO] o.a.t.k.p.v.r.KernelMessageRelay - Unstashing all messages received!
17/01/17 14:02:55 [INFO] o.a.t.k.p.v.r.KernelMessageRelay - Relay is now fully ready to receive messages!
[W 14:02:57.267 NotebookApp] Timeout waiting for kernel_info reply from eb48b34a-824e-42d7-96b6-931b23e438a7
QuantumRich commented 7 years ago

It seems my thread does not hang if I go off Wi-Fi. Perhaps the ports I am opening are being used? (e.g. Connecting to: tcp://127.0.0.1:50627) Is there a way of switching these?

Brian-Burns-Bose commented 7 years ago

I have been seeing this as well lately. It's intermittent. I started to notice it yet after upgraded the notebook server to 4.3.x. In our javascript client I am going to try and update the jupyterlab/services module. We are at 0.34.1, it looks like the latest is 0.35.1.

On Tue, Jan 17, 2017 at 3:25 PM, Richard Quan notifications@github.com wrote:

It seems my thread does not hang if I go off Wi-Fi. Perhaps the ports I am opening are being used? (e.g. Connecting to: tcp://127.0.0.1:50627) Is there a way of switching these?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/EclairJS/eclairjs/issues/20#issuecomment-273288993, or mute the thread https://github.com/notifications/unsubscribe-auth/AAiWKP5xz34NcFYEpaiAQ0OYkLVaoGzhks5rTSOzgaJpZM4LmCb6 .

QuantumRich commented 7 years ago

Oh I'm glad I'm not alone in it. I get it more often when I am running eclair behind a webapp than if I were running eclair by itself as a script.

doronrosenberg commented 7 years ago

We can remove the code that is causing this as we never use the kernel info, but its somewhat disconcerting that its happening.

doronrosenberg commented 7 years ago

@bpburns we can take the commit that removes the code as well as the new jupyter security stuff fix and maybe backport em to the 0.10 branch and release a 0.10.1 to npm, unless there is anything else we want to wait for

QuantumRich commented 7 years ago

Thank you very much for the updates. It was extra strange to me because all of my components were local and I couldn't use the Internet while running this.

QuantumRich commented 7 years ago

UPDATE: It seems that I still have the same (more often than intermittent) problems even with the dev version of the repo. It may be my company's proxy then, which doesn't make sense given this project should not be connecting to an external source. Perhaps my websockets are blocked, but it doesn't make sense that it is intermittent.

doronrosenberg commented 7 years ago

The timeout error seems to be on the Jupyter side, I see it running regular Scala notebooks as well. Other than the message, are things working fine?

QuantumRich commented 7 years ago

@doronrosenberg Yes it does seem like it's on the Jupyter side. It's not the message itself so much as the fact that the thread hangs after that. I get the message even if it works. But if it doesn't, all operations freeze and the Spark session is never initializes. Here is what I mean, shown below is the output from debug mode when it stops on me:

[I 19:18:30.700 NotebookApp] 0 active kernels 
[I 19:18:30.700 NotebookApp] The Jupyter Notebook is running at: http://localhost:8888/
[I 19:18:30.700 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
[W 19:18:34.573 NotebookApp] Sessions API changed, see updated swagger docs
[D 19:18:34.589 NotebookApp] Found kernel eclair in /Users/quantumrich/Library/Jupyter/kernels
[D 19:18:34.590 NotebookApp] Found kernel apache_toree_scala in /usr/local/share/jupyter/kernels
[D 19:18:34.591 NotebookApp] Native kernel (python2) available from /usr/local/lib/python2.7/site-packages/ipykernel/resources
[D 19:18:34.593 NotebookApp] Starting kernel: [u'/Users/quantumrich/Desktop/incubator-toree/dist/toree/bin/run.sh', u'--interpreter-plugin', u'eclair:org.eclairjs.nashorn.JavascriptInterpreter', u'--default-interpreter', u'eclair', u'--nosparkcontext', u'--profile', u'/Users/quantumrich/Library/Jupyter/runtime/kernel-d53113fa-f6c1-41f1-81cd-27264ed42655.json']
[D 19:18:34.598 NotebookApp] Connecting to: tcp://127.0.0.1:49606
[I 19:18:34.598 NotebookApp] Kernel started: d53113fa-f6c1-41f1-81cd-27264ed42655
[D 19:18:34.598 NotebookApp] Kernel args: {'kernel_name': u'eclair', 'cwd': u'/Users/quantumrich'}
[D 19:18:34.599 NotebookApp] 201 POST /api/sessions?1484785114561 (127.0.0.1) 27.14ms
Starting Spark Kernel with SPARK_HOME=/usr/local/spark
[D 19:18:34.610 NotebookApp] 200 GET /api/kernels/d53113fa-f6c1-41f1-81cd-27264ed42655?1484785114605 (127.0.0.1) 2.13ms
[D 19:18:34.619 NotebookApp] Initializing websocket connection /api/kernels/d53113fa-f6c1-41f1-81cd-27264ed42655/channels
[D 19:18:34.620 NotebookApp] Requesting kernel info from d53113fa-f6c1-41f1-81cd-27264ed42655
[D 19:18:34.621 NotebookApp] Connecting to: tcp://127.0.0.1:49603
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel).
17/01/18 19:18:35 [INFO] o.a.t.Main$$anon$1 - Kernel version: 0.2.0.dev1-incubating-SNAPSHOT
17/01/18 19:18:35 [INFO] o.a.t.Main$$anon$1 - Scala version: Some(2.11.8)
17/01/18 19:18:35 [INFO] o.a.t.Main$$anon$1 - ZeroMQ (JeroMQ) version: 3.2.5
17/01/18 19:18:35 [INFO] o.a.t.Main$$anon$1 - Initializing internal actor system
17/01/18 19:18:36 [INFO] o.a.t.Main$$anon$1 - Connection Profile: {
  "stdin_port" : 49605,
  "control_port" : 49606,
  "hb_port" : 49607,
  "shell_port" : 49603,
  "iopub_port" : 49604,
  "ip" : "127.0.0.1",
  "transport" : "tcp",
  "signature_scheme" : "hmac-sha256",
  "key" : "dcb58b14-1f04-41a4-a925-bd5fd1e7382c"
}
(SparkR,org.apache.toree.kernel.interpreter.sparkr.SparkRInterpreter@3b705be7)
(Scala,org.apache.toree.kernel.interpreter.scala.ScalaInterpreter@3a790e40)
(PySpark,org.apache.toree.kernel.interpreter.pyspark.PySparkInterpreter@43acd79e)
(eclair,org.eclairjs.nashorn.JavascriptInterpreter@1e1b061)
(SQL,org.apache.toree.kernel.interpreter.sql.SqlInterpreter@38f77cd9)
17/01/18 19:18:37 [WARN] o.a.t.Main$$anon$1 - No external magics provided to PluginManager!
17/01/18 19:18:39 [INFO] o.a.t.Main$$anon$1 - 13 internal plugins loaded
17/01/18 19:18:39 [INFO] o.a.t.Main$$anon$1 - 0 external plugins loaded
17/01/18 19:18:42 [WARN] o.a.t.k.i.s.ScalaInterpreter - kernel variable: org.apache.toree.boot.layer.StandardComponentInitialization$$anon$1@5777e443
17/01/18 19:18:42 [WARN] o.a.t.k.i.s.ScalaInterpreter - Binding List(@transient implicit) kernel org.apache.toree.kernel.api.Kernel org.apache.toree.boot.layer.StandardComponentInitialization$$anon$1@5777e443
17/01/18 19:18:43 [INFO] o.a.t.k.i.s.ScalaInterpreter - Binding SparkContext into interpreter as sc
Start kernel init
END kernel init
17/01/18 19:18:43 [INFO] o.a.t.Main$$anon$1 - Marking relay as ready for receiving messages
17/01/18 19:18:43 [INFO] o.a.t.k.p.v.r.KernelMessageRelay - Unstashing all messages received!
17/01/18 19:18:43 [INFO] o.a.t.k.p.v.r.KernelMessageRelay - Relay is now fully ready to receive messages!
[W 19:18:44.626 NotebookApp] Timeout waiting for kernel_info reply from d53113fa-f6c1-41f1-81cd-27264ed42655
[D 19:18:44.626 NotebookApp] Opening websocket /api/kernels/d53113fa-f6c1-41f1-81cd-27264ed42655/channels
[D 19:18:44.627 NotebookApp] Connecting to: tcp://127.0.0.1:49603
[D 19:18:44.627 NotebookApp] Connecting to: tcp://127.0.0.1:49604
[D 19:18:44.627 NotebookApp] Connecting to: tcp://127.0.0.1:49605

Meanwhile shown below is what it looks like when it does connect on rare occasions. It seems that if I get the 'Relay is ready' message BEFORE I connect to the 3 tcp ports it freezes.

[I 19:21:30.520 NotebookApp] 0 active kernels 
[I 19:21:30.520 NotebookApp] The Jupyter Notebook is running at: http://localhost:8888/
[I 19:21:30.520 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
[W 19:21:37.158 NotebookApp] Sessions API changed, see updated swagger docs
[D 19:21:37.178 NotebookApp] Found kernel eclair in /Users/quantumrich/Library/Jupyter/kernels
[D 19:21:37.179 NotebookApp] Found kernel apache_toree_scala in /usr/local/share/jupyter/kernels
[D 19:21:37.179 NotebookApp] Native kernel (python2) available from /usr/local/lib/python2.7/site-packages/ipykernel/resources
[D 19:21:37.180 NotebookApp] Starting kernel: [u'/Users/quantumrich/Desktop/incubator-toree/dist/toree/bin/run.sh', u'--interpreter-plugin', u'eclair:org.eclairjs.nashorn.JavascriptInterpreter', u'--default-interpreter', u'eclair', u'--nosparkcontext', u'--profile', u'/Users/quantumrich/Library/Jupyter/runtime/kernel-be95f4d2-3cd4-467c-858a-bf2c41c5add1.json']
[D 19:21:37.185 NotebookApp] Connecting to: tcp://127.0.0.1:49804
[I 19:21:37.186 NotebookApp] Kernel started: be95f4d2-3cd4-467c-858a-bf2c41c5add1
[D 19:21:37.186 NotebookApp] Kernel args: {'kernel_name': u'eclair', 'cwd': u'/Users/quantumrich'}
[D 19:21:37.187 NotebookApp] 201 POST /api/sessions?1484785297144 (127.0.0.1) 30.34ms
Starting Spark Kernel with SPARK_HOME=/usr/local/spark
[D 19:21:37.200 NotebookApp] 200 GET /api/kernels/be95f4d2-3cd4-467c-858a-bf2c41c5add1?1484785297192 (127.0.0.1) 1.41ms
[D 19:21:37.209 NotebookApp] Initializing websocket connection /api/kernels/be95f4d2-3cd4-467c-858a-bf2c41c5add1/channels
[D 19:21:37.212 NotebookApp] Requesting kernel info from be95f4d2-3cd4-467c-858a-bf2c41c5add1
[D 19:21:37.212 NotebookApp] Connecting to: tcp://127.0.0.1:49801
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel).
17/01/18 19:21:38 [INFO] o.a.t.Main$$anon$1 - Kernel version: 0.2.0.dev1-incubating-SNAPSHOT
17/01/18 19:21:38 [INFO] o.a.t.Main$$anon$1 - Scala version: Some(2.11.8)
17/01/18 19:21:38 [INFO] o.a.t.Main$$anon$1 - ZeroMQ (JeroMQ) version: 3.2.5
17/01/18 19:21:38 [INFO] o.a.t.Main$$anon$1 - Initializing internal actor system
17/01/18 19:21:39 [INFO] o.a.t.Main$$anon$1 - Connection Profile: {
  "stdin_port" : 49803,
  "control_port" : 49804,
  "hb_port" : 49805,
  "shell_port" : 49801,
  "iopub_port" : 49802,
  "ip" : "127.0.0.1",
  "transport" : "tcp",
  "signature_scheme" : "hmac-sha256",
  "key" : "a8a1a4a6-dc53-4680-8cf0-eed9b66e23cd"
}
17/01/18 19:21:39 [INFO] o.a.t.k.p.v.r.KernelMessageRelay - Not ready for messages! Stashing until ready!
(SparkR,org.apache.toree.kernel.interpreter.sparkr.SparkRInterpreter@63a28987)
(Scala,org.apache.toree.kernel.interpreter.scala.ScalaInterpreter@120d3fd)
(PySpark,org.apache.toree.kernel.interpreter.pyspark.PySparkInterpreter@38c9e0d6)
(eclair,org.eclairjs.nashorn.JavascriptInterpreter@4693a9ef)
(SQL,org.apache.toree.kernel.interpreter.sql.SqlInterpreter@310aee0b)
17/01/18 19:21:41 [WARN] o.a.t.Main$$anon$1 - No external magics provided to PluginManager!
17/01/18 19:21:42 [INFO] o.a.t.Main$$anon$1 - 13 internal plugins loaded
17/01/18 19:21:42 [INFO] o.a.t.Main$$anon$1 - 0 external plugins loaded
17/01/18 19:21:46 [WARN] o.a.t.k.i.s.ScalaInterpreter - kernel variable: org.apache.toree.boot.layer.StandardComponentInitialization$$anon$1@188467fe
17/01/18 19:21:46 [WARN] o.a.t.k.i.s.ScalaInterpreter - Binding List(@transient implicit) kernel org.apache.toree.kernel.api.Kernel org.apache.toree.boot.layer.StandardComponentInitialization$$anon$1@188467fe
17/01/18 19:21:47 [INFO] o.a.t.k.i.s.ScalaInterpreter - Binding SparkContext into interpreter as sc
[W 19:21:47.217 NotebookApp] Timeout waiting for kernel_info reply from be95f4d2-3cd4-467c-858a-bf2c41c5add1
[D 19:21:47.218 NotebookApp] Opening websocket /api/kernels/be95f4d2-3cd4-467c-858a-bf2c41c5add1/channels
[D 19:21:47.218 NotebookApp] Connecting to: tcp://127.0.0.1:49801
[D 19:21:47.219 NotebookApp] Connecting to: tcp://127.0.0.1:49802
[D 19:21:47.219 NotebookApp] Connecting to: tcp://127.0.0.1:49803
17/01/18 19:21:47 [INFO] o.a.t.k.p.v.r.KernelMessageRelay - Not ready for messages! Stashing until ready!
Start kernel init
END kernel init
17/01/18 19:21:47 [INFO] o.a.t.Main$$anon$1 - Marking relay as ready for receiving messages
17/01/18 19:21:47 [INFO] o.a.t.k.p.v.r.KernelMessageRelay - Unstashing all messages received!
17/01/18 19:21:47 [INFO] o.a.t.k.p.v.r.KernelMessageRelay - Relay is now fully ready to receive messages!
EXEC=var SparkSession = require(EclairJS_Globals.NAMESPACE + '/sql/SparkSession');
var builder1 = SparkSession.builder();
RESULT=null
17/01/18 19:21:47 [WARN] o.a.t.k.p.v.s.KernelOutputStream - Suppressing empty output: ''
EXEC=var builder2 = builder1.appName("Spark SQL Example");
RESULT=null
17/01/18 19:21:47 [WARN] o.a.t.k.p.v.s.KernelOutputStream - Suppressing empty output: ''
EXEC=var sparkSession1 = builder2.getOrCreate();
17/01/18 19:21:47 [WARN] o.a.h.u.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
17/01/18 19:21:49 [WARN] o.a.s.SparkContext - Use an existing SparkContext, some configuration may not take effect.
RESULT=null
17/01/18 19:21:49 [WARN] o.a.t.k.p.v.s.KernelOutputStream - Suppressing empty output: ''

It is very annoying for me to develop in an enterprise setting since I've relegated to turning off my computer's WiFi before I connect the websocket, and then doing all my spark operations after I get onto the network and the websocket has already connected. That is my workaround.

doronrosenberg commented 7 years ago

@bpburns knows jupyter and toree the best, any idea if oyu can tell it what ports to use? Might be the reason it sometimes works.

Brian-Burns-Bose commented 7 years ago

@doronrosenberg @QuantumRich you can pass those into Toree's run.sh script. Take a look at run.sh.

Brian-Burns-Bose commented 7 years ago

Having read through this https://github.com/jupyter/notebook/issues/297 . It could be certain libraries conflicting. My guess would be zero mq. For my notebook environment, I'm running Anaconda https://www.continuum.io/anaconda-overview. When I upgraded my notebook environment it seemed to help. It upgraded several libraries. I ran conda upgrade notebook. I've since successfully started Toree 6 times in a row so far.

QuantumRich commented 7 years ago

UPDATE:

It turns out this is the way to set the ports manually:

export TOREE_OPTS="--shell-port=61642 --iopub-port=61643"

Also, the connection hanging problem seems to be based on these two ports specifically while I am on the proxy. I am still trying to figure out how to change these two ports to allow the socket connection past the corporate firewall.