Kitware / HPCCloud

A Cloud/Web-Based Simulation Environment
https://kitware.github.io/HPCCloud/
Apache License 2.0
50 stars 23 forks source link

EventStream disconnect over long periods? #552

Open TristanWright opened 8 years ago

TristanWright commented 7 years ago

So played with this some more, definitely disconnects and is not responsive to SSE's manually triggered.

We get a console full of these:

screen shot 2016-11-15 at 10 07 51

net::ERR_EMPTY_RESPONSE

TristanWright commented 7 years ago

Could it have to do with the connection error pattern we have here?

https://github.com/Kitware/HPCCloud/blob/master/src/network/remote/GirderClient.js#L83-L103

TristanWright commented 7 years ago

It's a lot of repeated canceled and failed events, each with 2 minutes and ~7 second intervals

screen shot 2016-11-15 at 10 16 29

cjh1 commented 7 years ago

On Nov 15, 2016 10:06 AM, "TristanWright" notifications@github.com wrote:

So played with this some more, definitely disconnects and is not responsive to SSE's manually triggered.

We get a console full of these:

net::ERR_EMPTY_RESPONSE

There is server side logic that timeouts out the connect if no events are sent (https://github.com/girder/girder/blob/master/girder/api/v1/notification.py), however, I thought we had client side logic to reconnect ( may be that was in the previous incarnation of the client )

TristanWright commented 7 years ago

I thought we had client side logic to reconnect ( may be that was in the previous incarnation of the client )

That's the onerror lines here: https://github.com/Kitware/HPCCloud/blob/master/src/network/remote/GirderClient.js#L83-L103

I'm watching some logging, maybe the events we're missing is coming within the 10 second setTimeout screen shot 2016-11-15 at 10 36 16

TristanWright commented 7 years ago

Confirmed it doesn't have to do with the setTimeout, although I wonder if it should be a setImmediate regardless

TristanWright commented 7 years ago

I don't see this happening on http://localhost:8888/ It takes the full 5 minute timeout instead of 2 minutes.

On development, tried watching with EventSouce(url, { withCredentials: true }) no difference.

TristanWright commented 7 years ago

The response on the girder site is:

HTTP/1.1 200 OK
Date: Thu, 24 Nov 2016 00:09:18 GMT
Cache-Control: no-cache
Content-Type: text/event-stream;charset=utf-8
Allow: DELETE, GET, HEAD, OPTIONS, PATCH, POST, PUT
Server: CherryPy/8.1.2
Transfer-Encoding: chunked

hpcc gets a slightly different response, and it doesn't get it until there is at least one event:

HTTP/1.1 200 OK
X-Powered-By: Express    <--- webpack-dev-server?
connection: close     <--------------- what's this?
date: Thu, 24 Nov 2016 00:31:35 GMT
cache-control: no-cache
content-type: text/event-stream;charset=utf-8
allow: DELETE, GET, HEAD, OPTIONS, PATCH, POST, PUT
server: CherryPy/8.1.2
transfer-encoding: chunked

connection: close looks suspect, requests look the same though.

cjh1 commented 7 years ago

@TristanWright Great detective work! Yes, that is a big difference between Girder and our setup in dev mode. Have you tried connecting directly to 8888 port ( apache ) ?

TristanWright commented 7 years ago

on port 8888 it errors every 5 minutes instead of 2 minutes we see on 9999. The five minute matches the DEFAULT_STREAM_TIMEOUT we see in the girder notifications enpoint