TurboVNC / turbovnc

Main TurboVNC repository
https://TurboVNC.org
GNU General Public License v2.0
761 stars 138 forks source link

Allow listening and connecting to unix domain sockets #304

Closed steffen-kiess closed 1 year ago

steffen-kiess commented 2 years ago

This is https://github.com/TurboVNC/turbovnc/pull/294 rebased to main.

dcommander commented 1 year ago

Apologies for taking so long to pop the stack on this. I am integrating it into the dev/3.1 evolving branch now. One modification I am making that I wanted to run by you:

I have introduced a new boolean vncserver command-line argument and config file variable (-uds/$useUDS) that, if enabled, will cause vncserver to choose a default Unix domain socket path (~/.vnc/$host:$displayNumber.uds). This will allow users or SysAdmins to set a boolean variable in turbovncserver.conf and thus enable the use of Unix domain sockets on a per-user or per-system basis, then the vncserver script will automatically choose and manage the UDS path unless -rfbunixpath is explicitly specified. That mirrors the behavior of -rfbport, in that a default port based on the display number is chosen unless the user explicitly overrides it. The behavior of Xvnc is unchanged from your patch. All of this is handled by vncserver. This also opens up the possibility of using Unix domain sockets with the TurboVNC Session Manager at some point in the future. What do you think?

steffen-kiess commented 1 year ago

I think that would make sense.

Would -uds also disable listening on TCP? (This might make sense if -uds is combined with -securitytypes None to allow authentication by access to the unix domain socket.)

dcommander commented 1 year ago

I think that would make sense.

Would -uds also disable listening on TCP? (This might make sense if -uds is combined with -securitytypes None to allow authentication by access to the unix domain socket.)

-uds would just be a shortcut for -rfbunixpath $vncUserDir/$host:$display.uds, so yes, it would disable TCP connections. I don’t want -uds to imply specific security types, because eventually I will need -securitytypes otp in order to support the session manager. Also, if a user wants to allow collaboration, they will have to relax the permissions on the UDS, in which case they would want the collaborator to authenticate using an OTP or Unix credentials.

steffen-kiess commented 1 year ago

-uds would just be a shortcut for -rfbunixpath $vncUserDir/$host:$display.uds, so yes, it would disable TCP connections. I don’t want -uds to imply specific security types, because eventually I will need -securitytypes otp in order to support the session manager. Also, if a user wants to allow collaboration, they will have to relax the permissions on the UDS, in which case they would want the collaborator to authenticate using an OTP or Unix credentials.

That sounds good to me.

dcommander commented 1 year ago

I am observing a couple of issues when using the feature with -via rather than -tunnel. When I do something like this:

/opt/TurboVNC/bin/vncviewer -via {host} localhost:/{uds_path_on_host} -extssh

I see two SSH connections being made (because the host's SSH banner appears twice, and if public key authentication is disabled, I have to enter my SSH password on the host twice.) I assume that is intended behavior but wanted to double check. The problem is that I get prompted to store the SSH key for localhost. This causes the host's SSH key to be stored as localhost in the client's ~/.ssh/known_hosts file, which is incorrect and causes problems if you actually want to connect directly to localhost using SSH.

If I do the same thing without -extssh, I similarly get prompted to store the SSH key for localhost, which gets added to the client's ~/.ssh/known_hosts file. Also, the SSH agent doesn't seem to work properly, and I get "Too many authentication failures" when the viewer attempts to make the second (inner) SSH connection. If I don't use public key authentication, then I get prompted for my SSH password on the host when the first (outer) SSH connection is made, then I get prompted for my SSH password on the client when the second (inner) SSH connection is made. Then the viewer fails with "bash: line 0: exec: socat: not found".

Also, I wanted to get a better understanding of the purpose of the threads in StreamDescriptor.java and Tunnel.createTunnelJSchUDS().

Since I'm trying to integrate this into the dev branch, which has changed considerably relative to main, it's entirely possible that one or more of the issues above were caused by integration errors. I have pushed a temporary branch called "dev.uds.wip" for you to review.

steffen-kiess commented 1 year ago

I am observing a couple of issues when using the feature with -via rather than -tunnel. When I do something like this:

/opt/TurboVNC/bin/vncviewer -via {host} localhost:/{uds_path_on_host} -extssh

I see two SSH connections being made (because the host's SSH banner appears twice, and if public key authentication is disabled, I have to enter my SSH password on the host twice.) I assume that is intended behavior but wanted to double check.

Yes, this was the intended behavior. (A connection to localhost is being made, and this is tunneled over {host}, similar to ssh -J {host} localhost.)

The problem is that I get prompted to store the SSH key for localhost. This causes the host's SSH key to be stored as localhost in the client's ~/.ssh/known_hosts file, which is incorrect and causes problems if you actually want to connect directly to localhost using SSH.

I assumed people would be using

/opt/TurboVNC/bin/vncviewer {host}:/{uds_path_on_host} -extssh

in this case. But it might make sense to create a special case for localhost, so that connecting to localhost with -via {host} is treated the same as connecting to {host} directly.

Also, I wanted to get a better understanding of the purpose of the threads in StreamDescriptor.java and Tunnel.createTunnelJSchUDS().

The first thread in StreamDescriptor.java reads from inputStream (which would be the InputStream provided by channelExec.getInputStream(), i.e. the stdout of the remote command) and writes the data into inBuffer. This thread is needed because the only way (I could find, anyway) to find out when data is available from the InputStream is to call read() which will block, and therefore cannot be called on the main thread.

The second thread does the same thing for stdin of the remote command (and is needed because write() might block if the remote command is consuming the provided data too slowly).

The thread in Tunnel.createTunnelJSchUDS() copies the stderr of the remote command to the stderr of turbovnc (which also has to be in a separate thread because it blocks when calling read()).

Since I'm trying to integrate this into the dev branch, which has changed considerably relative to main, it's entirely possible that one or more of the issues above were caused by integration errors. I have pushed a temporary branch called "dev.uds.wip" for you to review.

Ok, I'll take a look at the branch.

dcommander commented 1 year ago

Yes, this was the intended behavior. (A connection to localhost is being made, and this is tunneled over {host}, similar to ssh -J {host} localhost.)

The problem is that I get prompted to store the SSH key for localhost. This causes the host's SSH key to be stored as localhost in the client's ~/.ssh/known_hosts file, which is incorrect and causes problems if you actually want to connect directly to localhost using SSH.

I assumed people would be using

/opt/TurboVNC/bin/vncviewer {host}:/{uds_path_on_host} -extssh

in this case. But it might make sense to create a special case for localhost, so that connecting to localhost with -via {host} is treated the same as connecting to {host} directly.

OK, I can confirm that ssh -J does the same thing, so that's fair. It still needs to work properly without -extssh, though. The issue with the JSch case seems to be that the inner SSH connection is made from the point of view of the client rather than from the point of view of the gateway ("jump host".)

Also, I wanted to get a better understanding of the purpose of the threads in StreamDescriptor.java and Tunnel.createTunnelJSchUDS().

The first thread in StreamDescriptor.java reads from inputStream (which would be the InputStream provided by channelExec.getInputStream(), i.e. the stdout of the remote command) and writes the data into inBuffer. This thread is needed because the only way (I could find, anyway) to find out when data is available from the InputStream is to call read() which will block, and therefore cannot be called on the main thread.

But blocking I/O isn't a problem when using TCP ports. Is that because the SSH connection is forked, using ssh -f?

The second thread does the same thing for stdin of the remote command (and is needed because write() might block if the remote command is consuming the provided data too slowly).

The thread in Tunnel.createTunnelJSchUDS() copies the stderr of the remote command to the stderr of turbovnc (which also has to be in a separate thread because it blocks when calling read()).

Again, though, why is this only necessary with Unix domain sockets?

steffen-kiess commented 1 year ago

One problem I found with the current code is if the ssh connection is closed, turbovnc does not seem to notice that (this was working on the 3.0 branch). I have to check why that's the case.

OK, I can confirm that ssh -J does the same thing, so that's fair. It still needs to work properly without -extssh, though. The issue with the JSch case seems to be that the inner SSH connection is made from the point of view of the client rather than from the point of view of the gateway ("jump host".)

Ok, I'll have to look into that problem.

But blocking I/O isn't a problem when using TCP ports. Is that because the SSH connection is forked, using ssh -f?

With TCP ports the SocketChannel API (see https://docs.oracle.com/en/java/javase/19/docs/api/java.base/java/nio/channels/SocketChannel.html) can be used to check whether data is available (or can be sent) without blocking (which is done in SocketDescriptor.java in turbovnc). When using the streams provided by JSch (from channelExec.getInputStream() etc.) to access stdin / stdout, this is not possible.

The thread in Tunnel.createTunnelJSchUDS() copies the stderr of the remote command to the stderr of turbovnc (which also has to be in a separate thread because it blocks when calling read()).

Again, though, why is this only necessary with Unix domain sockets?

The forwarding over TCP uses setPortForwardingL() (which internaly uses ChannelDirectTCPIP in PortWatcher.java) so there is no stderr channel.

The code in the SessionManager class (which, like the Unix domain socket code, uses ChannelExec) also has to deal with stderr (e.g. in SessionManager.getSessions(), see e.g.: https://github.com/TurboVNC/turbovnc/blob/7cbab9560585907319d5bea3f123e331081a9600/java/com/turbovnc/vncviewer/SessionManager.java#L105 From what I can see the way the code currently works is that it reads stderr after reading stdout. I think there is a risk of a deadlock here, if this is not done in a separate thread: If the server sends a lot of data over stderr, flow control might kick in and the server might wait until the client starts reading the data from stderr. But if the client is waiting until stdout has been closed before reading from stderr, there is a deadlock.

dcommander commented 1 year ago

With TCP ports the SocketChannel API (see https://docs.oracle.com/en/java/javase/19/docs/api/java.base/java/nio/channels/SocketChannel.html) can be used to check whether data is available (or can be sent) without blocking (which is done in SocketDescriptor.java in turbovnc). When using the streams provided by JSch (from channelExec.getInputStream() etc.) to access stdin / stdout, this is not possible.

Would any of that become more straightforward if we took advantage of Unix domain socket support in Java 16? Because the current plan is to embed Java 17 with TurboVNC 3.1, so using the built-in Unix domain socket API in Java should be possible. I remember that we discussed that, but I forgot what we concluded.

From what I can see the way the code currently works is that it reads stderr after reading stdout. I think there is a risk of a deadlock here, if this is not done in a separate thread: If the server sends a lot of data over stderr, flow control might kick in and the server might wait until the client starts reading the data from stderr. But if the client is waiting until stdout has been closed before reading from stderr, there is a deadlock.

I'll look into that.

steffen-kiess commented 1 year ago

Would any of that become more straightforward if we took advantage of Unix domain socket support in Java 16? Because the current plan is to embed Java 17 with TurboVNC 3.1, so using the built-in Unix domain socket API in Java should be possible. I remember that we discussed that, but I forgot what we concluded.

For connecting to a remote host, this would not help, as the problem is a limitation in the JSch API (when using -extssh=0) or the Process API (when using -extssh=1). When connecting to a UDS on the local host, I think it would help, because then UnixDomainSocketAddress + SocketChannel can be used, and SocketChannel allows select() to be used.

steffen-kiess commented 1 year ago

One problem I found with the current code is if the ssh connection is closed, turbovnc does not seem to notice that (this was working on the 3.0 branch). I have to check why that's the case.

The problem is that StreamDescriptor.read() return -1 on EOF, but FdInStream.readWithTimeoutOrCallback considers 0 to be EOF:

https://github.com/TurboVNC/turbovnc/blob/7cbab9560585907319d5bea3f123e331081a9600/java/com/turbovnc/network/StreamDescriptor.java#L118

https://github.com/TurboVNC/turbovnc/blob/7cbab9560585907319d5bea3f123e331081a9600/java/com/turbovnc/rdr/FdInStream.java#L202

SocketDescriptor.read() I think returns 0 on EOF: https://github.com/TurboVNC/turbovnc/blob/7cbab9560585907319d5bea3f123e331081a9600/java/com/turbovnc/network/SocketDescriptor.java#L86 (It turns the -1 which SocketChannel.read() returns on EOF into 0).

So probably StreamDescriptor.read() should also return 0 on EOF.

steffen-kiess commented 1 year ago

Another problem I saw was here: https://github.com/TurboVNC/turbovnc/blob/7cbab9560585907319d5bea3f123e331081a9600/java/com/turbovnc/vncviewer/Tunnel.java#L375 connectUDSDirect() should not wait for the socat process to finish.

In https://github.com/steffen-kiess/turbovnc/commit/d3e55fe8074c066322ebabf1eb2981be08c8f78a I fixed these problems. This commit also makes commands like vncviewer :/some/socket -via=some.host.example.com (or equivalently vncviewer localhost:/some/socket -via=some.host.example.com) work. It will now open an ssh connection to some.host.example.com and will not attempt to open a second (inner) connection to localhost anymore. This works both with -extssh=1 and -extssh=0.

dcommander commented 1 year ago

OK, thanks. I will look into all of that later this week. I think we should assume Java 16+ for local UDS connections, in order to simplify the code. I’ll look into that as well, but not until the current implementation is working properly and has been fully merged.

dcommander commented 1 year ago

From what I can see the way the code currently works is that it reads stderr after reading stdout. I think there is a risk of a deadlock here, if this is not done in a separate thread: If the server sends a lot of data over stderr, flow control might kick in and the server might wait until the client starts reading the data from stderr. But if the client is waiting until stdout has been closed before reading from stderr, there is a deadlock.

Realistically, this shouldn't occur, because only a small amount of data is sent over stdout and stderr. Specifically, stdout is used to send the information about active TurboVNC sessions, and stderr is used to send the console output of /opt/TurboVNC/bin/vncserver. That console output will not be more than a few lines. Unless I miss my guess, I do not think that such a deadlock could occur unless the user did something really stupid, such as installing some other script as /opt/TurboVNC/bin/vncserver.

In https://github.com/steffen-kiess/turbovnc/commit/d3e55fe8074c066322ebabf1eb2981be08c8f78a I fixed these problems. This commit also makes commands like vncviewer :/some/socket -via=some.host.example.com (or equivalently vncviewer localhost:/some/socket -via=some.host.example.com) work. It will now open an ssh connection to some.host.example.com and will not attempt to open a second (inner) connection to localhost anymore. This works both with -extssh=1 and -extssh=0.

That fixes the redundant double SSH connection, but it doesn't fix the issue whereby, when using JSch, the inner SSH connection is made from the point of view of the client rather than the jump host. Let's say that I'm connecting to a machine called turbovnc_host that is only visible from a machine called gateway_host. Attempting to do something like this:

/opt/TurboVNC/bin/vncviewer -via gateway_host turbovnc_host:/{uds_path}

results in an UnknownHostException. It works properly with -extssh.

dcommander commented 1 year ago

If it proves too difficult to make JSch work properly with this feature, then I am OK with deferring that functionality to a future release. With TurboVNC 3.1, I think it would be OK to require ExtSSH in order to use Unix domain sockets remotely and Java 16+ in order to use Unix domain sockets locally. It appears that proper Unix domain socket support is at least on the JSch development radar.

Also, regarding the % variables in the UDS path, I think it would make more sense to parse ~ as the home directory instead of %h and then process any VNC display beginning with either / or ~ as a UDS path instead of a TCP port. I am OK with leaving %u and %i as undocumented variables. Also, since the TurboVNC Server stores the UDS path in a known location for each VNC display, I think it would make sense to have special display designators in the viewer-- uds1, uds2, etc.-- that automatically map to the UDS path stored by the server for a particular VNC display. That way, a user could simply run /opt/TurboVNC/bin/vncserver -uds on the host, which would report the display as {host}:uds{n}, and they could then enter {host}:uds{n} into the viewer. The full UDS path would be reported in the server log, as the TCP port currently is. It probably also makes sense for the viewer to parse the full UDS path only with a double colon rather than a single colon, to mirror the TCP functionality. As is the case with TCP, a user could allow the server to automatically determine a UDS path by passing -uds, in which case they could enter a simplified display designator ({host}:uds{n}), which maps to that UDS path, into the viewer. They could also pass -rfbunixpath to the server, which would require them to enter the exact UDS path ({host}::{uds_path}) into the viewer, much as they would have to enter {host}::{tcp_port} into the viewer if they started the server with a non-default TCP port (-rfbport.) I can make all of those changes myself but wanted to give you a chance to comment on them first.

dcommander commented 1 year ago

The "dev.uds.wip" branch now contains the following changes:

Note also that I have observed a performance issue when using the Cygwin SSH client. This seems to be related to StreamDescriptor, because if I forward a local TCP port to the remote Unix domain socket, the performance is as expected. This can be reproduced with Cygwin SSH as follows:

This performs very slowly:

vncviewer.bat {host}::{uds_path} -securitytypes vncauth

This performs acceptably:

VNC_TUNNEL_CMD='ssh.exe -axf -L %L:%R %H sleep 20' vncviewer.bat {host}::{uds_path} -securitytypes vncauth

I don't observe the same performance issue with Mac or Linux clients, nor with the Microsoft-provided SSH client. Thus, it isn't a show-stopping issue, but I am mentioning it in case you might have any ideas regarding its cause.

dcommander commented 1 year ago

Unless I hear back from you, the feature as implemented in "dev.uds.wip" will be pushed to dev later this week. (I still need to finish the documentation of it.)

steffen-kiess commented 1 year ago

Realistically, this shouldn't occur, because only a small amount of data is sent over stdout and stderr. Specifically, stdout is used to send the information about active TurboVNC sessions, and stderr is used to send the console output of /opt/TurboVNC/bin/vncserver. That console output will not be more than a few lines. Unless I miss my guess, I do not think that such a deadlock could occur unless the user did something really stupid, such as installing some other script as /opt/TurboVNC/bin/vncserver.

I think there might be some cases where e.g. the remote ssh server writes error messages to stderr, but I agree that probably in all realistic cases these will be too small to trigger a deadlock.

Also, regarding the % variables in the UDS path, I think it would make more sense to parse ~ as the home directory instead of %h and then process any VNC display beginning with either / or ~ as a UDS path instead of a TCP port.

Note that the home directory was %d, not %h (the % codes match those documented in ssh_config(5)). I agree that ~ is more readable than %d. However, it might be a good idea to only allow ~ at the beginning of the path. (A ~ not at the beginning of the path would not work anyway, and this way it is possible specify paths which actually contain a ~, if needed. This also would match the behavior of unix shells, which only expand ~ at the beginning.) This can be achieved by using

        if (i == 0 && udsPath.charAt(i) == '~') {

instead of

        if (udsPath.charAt(i) == '~') {

in java/com/turbovnc/vncviewer/Tunnel.java.

Also, currently a path connection to hostname::~dir/socket will work, but attempt to connect to /home/usernamedir/socket which probably is not what the user intended. May the check in Hostname.java should be changed from substring.startsWith("::/") || substring.startsWith("::~") to substring.startsWith("::/") || substring.startsWith("::~/").

I am OK with leaving %u and %i as undocumented variables.

Ok. I like having %i as it allows setting the path to something like /run/user/%i/some_socket without knowning the numeric user ID on the remote host.

  • Note that I chose not to integrate the change to CConn.java since, upon further reflection, that change made the viewer's behavior more confusing.

Ok.

  • As discussed above, Unix domain socket paths are now specified with a double colon rather than a single colon.

  • As discussed above, ~ in Unix domain socket paths, rather than %h, now expands to the user's home directory on the VNC host, and a ~ or / character following a double colon is assumed to be the start of a Unix domain socket path.

  • I have shied away from using the :uds{n} syntax described above, since I really need to understand how best to integrate Unix domain sockets with the TurboVNC Session Manager before I choose a more user-friendly syntax. Instead, I have repurposed the %h variable, which now expands to the remote host name, so users can specify {host}::~/.vnc/%h_{display_number}.uds in order to connect to the Unix domain socket that the TurboVNC Server creates by default when passed -uds.

Ok. I agree that %h is useful.

Note also that I have observed a performance issue when using the Cygwin SSH client. [...] I don't observe the same performance issue with Mac or Linux clients, nor with the Microsoft-provided SSH client. Thus, it isn't a show-stopping issue, but I am mentioning it in case you might have any ideas regarding its cause.

I'm afraid I don't, and since Windows now has a proper ssh client (and WSL exists) I'm not really using Cygwin anymore anyway.

Unless I hear back from you, the feature as implemented in "dev.uds.wip" will be pushed to dev later this week. (I still need to finish the documentation of it.)

From my side everything in the branch seems to be ok, except that (as mentioned above) I would restrict ~ expansion to the beginning of the string.

dcommander commented 1 year ago

I will make the suggested change for parsing ~. I didn’t realize that the % variables were specified in the ssh_config man page. That makes things easier, because it means I can just refer users to that man page. I do notice, however, that %i and %u are supposed to expand to the “local” user ID and username. Should we use %r instead? I’m not sure what “local” and “remote” means in the context of that man page.

steffen-kiess commented 1 year ago

I’m not sure what “local” and “remote” means in the context of that man page.

I suppose "local" means the host where the ssh command is running and "remote" is the host where sshd is running.

I do notice, however, that %i and %u are supposed to expand to the “local” user ID and username. Should we use %r instead?

Unfortunately ssh does not have a remote equivalent for %i (it cannot get this value without logging in to the remote host, and these values are expanded before connecting, at least in some contexts). Note that vncviewer anyway does not match the ssh semantic exactly (e.g. %h in vncviewer is now the remote host name as seen by the remote host, while in ssh it refers to the remote host name as seen by the local host (which might differ, e.g. it might by fully qualified) so the semantics anyway would not be exactly the same.

dcommander commented 1 year ago

Pushed to dev. Remaining things to do:

  1. Use the built-in UDS functionality in Java 16 in order to simplify the implementation (this will probably go into TurboVNC 3.1)
  2. Fix issues that prevent multi-level SSH/UDS connections from working properly with JSch (no clue how to do that-- may have to wait for official UDS support in JSch unless you or someone else can figure it out)
  3. Add UDS support to the TurboVNC Session Manager (requires Item 2 above)
dcommander commented 1 year ago

The idea behind Item 3 is that the Session Manager protocol, which is basically just an efficient way of communicating TurboVNC session information through stdout/SSH, could be extended so that it lists Unix domain socket paths for running sessions, and the Session Manager could seamlessly set up the SSH tunnel to a remote UDS rather than to a remote TCP port. However, since the Session Manager is designed to be self-contained, it really needs JSch. Technically speaking, the Session Manager doesn't need multi-level SSH tunnels, so it should be possible to support UDS connections with it now, but if we say that UDS connections work with JSch, then that needs to be true for all operational modes.

dcommander commented 1 year ago

Would any of that become more straightforward if we took advantage of Unix domain socket support in Java 16? Because the current plan is to embed Java 17 with TurboVNC 3.1, so using the built-in Unix domain socket API in Java should be possible. I remember that we discussed that, but I forgot what we concluded.

For connecting to a remote host, this would not help, as the problem is a limitation in the JSch API (when using -extssh=0) or the Process API (when using -extssh=1). When connecting to a UDS on the local host, I think it would help, because then UnixDomainSocketAddress + SocketChannel can be used, and SocketChannel allows select() to be used.

In both cases, StreamSocket is used. So if built-in UDS support in Java 16 wouldn't help the SSH case, then I don't see how it could reduce code complexity at all. Am I missing something?

steffen-kiess commented 1 year ago

In both cases, StreamSocket is used. So if built-in UDS support in Java 16 wouldn't help the SSH case, then I don't see how it could reduce code complexity at all. Am I missing something?

No, I don't think so. StreamSocket will be needed for the ssh case, only the local case might avoid it, so this won't simplify the code.

However, using the built-in UDS support will avoid the dependency on socat. With JSch (if it has Unix Domain support) it also might be possible to avoid the remote dependency on socat. (But then, implementing the ~ and % expansion will be more difficult, but still should be possible by executing commands like id remotely before connecting to the Unix Domain socket).