OSC / ondemand

Supercomputing. Seamlessly. Open, Interactive HPC Via the Web
https://openondemand.org/
MIT License
293 stars 107 forks source link

Shell: Reconnect - Decouple terminal sessions from connections #142

Open ericfranz opened 5 years ago

ericfranz commented 5 years ago

Goal: If you temporarily lose internet connection, close your laptop to walk to a meeting, or switch wifi network, the shell app browser client will attempt to reconnect to the server, instead of your work being lost. https://trello.com/c/wUXyxqFb/60-shell-reconnect

There are several possible issues that can result in the ssh session dying after 1 a minute or two disconnected from the network.

The problem with the app as it is written is that the ssh session https://github.com/OSC/ood-shell/blob/2a1af635dbcfb64221a013f4500c0d1d6cd9301e/app.js#L67-L73 is coupled with the ws connection object https://github.com/OSC/ood-shell/blob/2a1af635dbcfb64221a013f4500c0d1d6cd9301e/app.js#L47 or "client". So when a connection is closed due to temporary loss of network access (close laptop, switch networks, etc.) the terminal session is also closed https://github.com/OSC/ood-shell/blob/2a1af635dbcfb64221a013f4500c0d1d6cd9301e/app.js#L97-L100

Instead, terminal sessions should be stored in a hash like object with a key being a unique identifier that is shared with the client so that the client can use that identifier when it tries to initiate a new ws connection. Then the new ws connection would have three possiblities:

  1. new connection - no unique identifier sent
  2. trying to reconnect - unique identifier sent that matches an existing ssh session, so associate that connection with the ssh session
  3. trying to reconnect to terminated ssh session - unique identifier sent that does not match an existing ssh session, return an error that can be displayed to the user!

With this work we still have the problem of the web server itself being killed by Passenger after the passenger_pool_idle_time is reached (5 minutes). Enabling sites to increase this would address the shell problem but introduce other problems as it is an option that affects all apps.

In the PUN configs, it seems that we are using default values for these:

┆Issue is synchronized with this Asana task by Unito

ericfranz commented 5 years ago

This is also a problem client side: https://github.com/OSC/ood-shell/blob/2a1af635dbcfb64221a013f4500c0d1d6cd9301e/public/javascripts/ood_shell.1.js#L15

Instead, when the websocket is closed, a warning message should appear above the shell app - not printed to the terminal - that claims "the websocket connection failed, trying to reconnect"

ericfranz commented 5 years ago

And for the shell app only we could consider setting passenger_min_instances to 1:

Please note that this option does not pre-start application processes during Nginx startup. It just makes sure that when the application is first accessed:

  1. at least the given number of processes will be spawned.
  2. the given number of processes will be kept around even when processes are being idle cleaned.

Would this ensure that the for any user running a shell app that PUNs and the shell app stick around, even after users go away from the app? Forever?

We might want to consider when we would want to ensure shell apps actually get shut down.

johrstrom commented 3 years ago

479 is also related to this.

matt257 commented 4 months ago

reviewed, is a good idea but needs looked at again and updated