golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
124.16k stars 17.69k forks source link

x/crypto/ssh: ssh.Channel should provide idle timeouts #21420

Closed glycerine closed 7 years ago

glycerine commented 7 years ago

It would be incredibly useful for securing connections everywhere if ssh.Channel implemented the full net.Conn interace. The only missing methods are for time-outs based on setting read and write deadlines.

In addition, presently when using ssh.Channel, it is common to leak Goroutines. Without deadlines for read and write, those blocking Read/Write calls on ssh.Channel will never let the goroutine return. I have found that ssh.Channel.Close() from a separate goroutine will unblock Reads, but this is quite cumbersome and would still require every piece of code currently using net.Conn to need much additional logic and elaborate coordination just to have simple read and write deadlines.

A related but distinct api request: https://github.com/golang/go/issues/20288

I inquired of Han-wen and over email and he suggesting filing a bug, hence this ticket.

Emerson suggested the context.Context support might be nice as well, but that's not in the net.Conn interface. However once implemented, the net.Conn interface would make adding context support easier.

@hanwen @Gh0u1L5

glycerine commented 7 years ago

As an example of deadlock prone code, the channel receive <-ch.msg at line 322 of mux.go will deadlock when the remote side dies. If you ignore the deadlock, the reading goroutine leaks.

https://github.com/golang/crypto/blob/master/ssh/mux.go#L322

    switch msg := (<-ch.msg).(type) {  // deadlock happens here
    case *channelOpenConfirmMsg:
        return ch, nil
    case *channelOpenFailureMsg:
        return nil, &OpenChannelError{msg.Reason, msg.Message}
    default:
        return nil, fmt.Errorf("ssh: unexpected packet in response to channel open: %T", m\
sg)
    }

update: Here unfortunately there is no recourse to Close()-ing the OpenChannel call, since there is no channel yet. Even the top level ssh.Client is no help, for even if one Close()-es the ssh.Client on which the call was made, the deadlock at line 322 remains.

glycerine commented 7 years ago

Upon prototyping the net.Conn approach, I discovered that the deadline is too low level to be useful with io.Copy(), in paritcular for sending big files that need some time to complete. Instead I've implemented a high level idle-timeout facility in my branch https://github.com/glycerine/xcryptossh See that README for discussion. I'll change the title here to reflect this discovery.

nhooyr commented 7 years ago

What you point out as a deadlock isn't really a deadlock if the remote side dies. That channel will be closed if the remote site dies, see https://github.com/golang/crypto/blob/81e90905daefcd6fd217b62423c0908922eadb30/ssh/mux.go#L194 .

However, I still think there should be idle timeouts, we shouldn't wait forever for the remote side to answer.

nhooyr commented 7 years ago

Just realized this issue doesn't really address my problem, I'll open a new issue.

edit: never mind, realized you already opened a very similar one so I added a comment there

hanwen commented 7 years ago

"It would be incredibly useful for securing connections everywhere if ssh.Channel implemented the full net.Conn interace. The only missing methods are for time-outs based on setting read and write deadlines."

what does 'securing' connections mean?

OpenChannel() is a global request, so if we elect to ignore the answer (if we think it takes too long), it will upset the state machine, meaning that you have close the whole ssh.Conn.

glycerine commented 7 years ago

what does 'securing' connections mean?

I meant that all the Go programs everywhere that use net.Conn to do TCP... could transparently change to use ssh. And those would be much more secure connections.

OpenChannel() is a global request, so if we elect to ignore the answer (if we think it takes too long), it will upset the state machine, meaning that you have close the whole ssh.Conn.

I don't see why the upset would be intrinsic. Is it just an artifact of the implementation status quo?

hanwen commented 7 years ago

let me rephrase. Suppose the timeout triggers, how does your program proceed?

It cannot try to open another channel, since the global request lock has been taken. (the global request lock is implied by the RFC)

glycerine commented 7 years ago

btw I love this library, and I'm so glad that you work on it. I don't mean to be critical at all. Ok, back to the conversation:

Suppose the timeout triggers, how does your program proceed?

Interesting. I had been assuming I as an application programmer would only care about timeouts on channels that had been successfully established. I assumed the channel open would succeed or fail fairly quickly one way or the other as it stands. Is that not a correct assumption? i.e. can OpenChannel() hang?

hanwen commented 7 years ago

my question is specifically about the hang in l.322 . Your assumption in the last comment is correct, but I was under the impression you wanted to have a way to set a timeout on that (ie. on the OpenChannel method). My response is that this mostly useless, if OpenChannel is stuck, the protocol is organized such that you have to throw away the whole connection rather than just the (pending) channel.

We could try to add timeouts to SSH to mimick TCP, but the timeouts would be on the ssh.Conn, not ssh.Channel, as there is usually no way for the channel to fail without the connection itself being broken too.

glycerine commented 7 years ago

usually no way for the channel to fail without the connection itself being broken too

So my use case actually sees this alot, basically because it is forwarding a stream from a third party. In a diagram:

 P3 -> P2 -> P1
        ^
        |
       P4

So the connection from P2 to P1 is fine, but when the connection from P3 to P2 dies, the channel that is forwarding over P2 to P1 ends up needing an idle timeout, because P3 is gone. And in the meantime P4 to P1 traffic can be fine, so the whole connection from P2 to P1 should be preserved.

I appreciate the discussion, but the long and short is that I've got this all working on my branch now; i.e. per Channel timeouts work fine (for reads, never got writes of course). So you're welcome to pull from it as you like to add features. And we can probably close these tickets.

hanwen commented 7 years ago

@glycerine - to me, your diagram is not an argument for adding timeouts to the Channel but rather that forwarding code should detect TCP timeouts and result in closing the Channel that provides the forwarded stream.

glycerine commented 7 years ago

forwarding code should detect TCP timeouts

a) nobody wants to rely upon TCP timeouts; they aren't always available or reliable, moreover a unix domain socket won't have them. And logically it doesn't work because the timeout isn't going to be channel specific. If you did get a timeout, which of many channels should it apply to? No way to tell. You could be forwarding a foward and the (inductively) the drop was before you, possibly many steps back.

b) The suggestion breaks the io.Writer abstraction. Imagine you are inside io.Copy: that would require that each Write() know the Read() that its source material came from. Ugh.

Currently my Channels implement net.Conn, which I am happy about.

hanwen commented 7 years ago

I still don't get it. If P3 -> P2 is gone, then the copy goroutine that forwards over p2 -> p1 will error out, presumably leading to the forwarding channel being closed. Why does that not work as expected?

glycerine commented 7 years ago

the copy goroutine that forwards over p2 -> p1 will error out

Hmm... how does that error out happen without a timeout on the Channel?

hanwen commented 7 years ago

Sorry, misphrased.

P2 is the remote server that runs some flavor of sshd. If the connection P3 -> P2 dies, then the server at P2 will notice this at some point. When that happens, it should close down the channel.

What sshd is on the other side?

glycerine commented 7 years ago

I thought we were discussing is the converse... Can one specific Channel need a timeout while the underlying Connection is okay? I provided one specific example, but the case is more general, and its not necessarily that an SSH server is involved at all upstream.

Channel timeouts handle more general "lack of response" as well as full connection drop somewhere. i.e. someone's goroutine is hung due to logic error, waiting on user input, deadlock, livelock, a long DNS resolve, etc.

hanwen commented 7 years ago

I guess we have an impedance mismatch. I'm trying to understand problems that you have that require changes to the library to be solved.

Is there a reason you cannot use a wrapper around ssh.Channel to mimick a net.Conn?

glycerine commented 7 years ago

Is there a reason you cannot use a wrapper around ssh.Channel to mimick a net.Conn?

Several. I wouldn't have forked otherwise, of course. Notice the changes in xcryptossh for details, but in summary: one needs notification or a callback at the start of every Read, every Write, every packet receive, and during continuous operation. And more importantly, one must be able to interrupt the sync.Cond based blocks without closing the connection, e.g. buffer.go (https://github.com/glycerine/xcryptossh/blob/master/buffer.go#L69) and https://github.com/glycerine/xcryptossh/blob/master/channel.go#L504 ).

hanwen commented 7 years ago

I think you're making it too complicated. See https://play.golang.org/p/o_d3T-3irc for an idea.

glycerine commented 7 years ago

That's a deadline, not an idle timeout. Surely by now you've read the discussion above and at xcyptossh for why that's insufficient. Also refer to the title of this ticket.

hanwen commented 7 years ago

Do you want functionality that is like

// Read until buf is full or timeout has passed, whichever comes first ReadWithTimeout(buf []byte, timeout time.Duration)

sorry for being dense, but I still haven't fully grasped what you need.

hanwen commented 7 years ago

(with the function implying that the channel stays alive if the timeout is hit.)

glycerine commented 7 years ago

So I'm going to close this out. It's been a useful and informative discussion. I'm happy with my https://github.com/glycerine/xcryptossh fork and continue to experiment with it, and it makes sense to me that x/crypto/ssh should stay simpler and with a fully backwards compatible API.

Do you want functionality that is like

The current API looks like this https://github.com/glycerine/xcryptossh/blob/master/channel.go#L94 (lines 94-164). I'm still working on the application layer that uses it, so it may evolve further. Status() is rather new, and I've currently exposed the IdleTimers as a part of the API because I was thinking that was needed. However I'm refactoring at the moment to split the application into two layers, a lower layer that simply maintains ssh.Connections, reconnecting whenever needed, and an upper layer that only deals in ssh.Channel.