Closed TerraNibble closed 5 months ago
I should update that this is observed against an OpenSSH server with a default client side SSH configuration.
From the large logging you can observe the exit timeout delay here:
2023-10-31 14:44:25 DEBUG LocalWindow:85 - Consume LocalWindow[client](ChannelExec[id=0, recipient=0]-ClientSessionImpl[ryanlivingston@localhost/127.0.0.1:22]) by 1 down to 2097148
2023-10-31 14:44:25 DEBUG ChannelExec:829 - handleData(ChannelExec[id=0, recipient=0]-ClientSessionImpl[ryanlivingston@localhost/127.0.0.1:22]) SSH_MSG_CHANNEL_DATA len=1
2023-10-31 14:44:30 DEBUG DefaultScpClient:177 - handleCommandExitStatus(ClientSessionImpl[ryanlivingston@localhost/127.0.0.1:22]) cmd='scp -p -t -- /Users/ryanlivingston/tmp/testfile.txt', waited=5005150209 nanos, events=[TIMEOUT, OPENED]
2023-10-31 14:44:30 DEBUG DefaultScpClient:207 - handleCommandExitStatus(ClientSessionImpl[ryanlivingston@localhost/127.0.0.1:22]) cmd='scp -p -t -- /Users/ryanlivingston/tmp/testfile.txt', exit-status=null
2023-10-31 14:44:30 DEBUG ChannelExec:1004 - sendEof(ChannelExec[id=0, recipient=0]-ClientSessionImpl[ryanlivingston@localhost/127.0.0.1:22]) SSH_MSG_CHANNEL_EOF (state=Opened)
2023-10-31 14:44:30 DEBUG ClientSessionImpl:1402 - encode(ClientSessionImpl[ryanlivingston@localhost/127.0.0.1:22]) packet #12 sending command=96[SSH_MSG_CHANNEL_EOF] len=5
This is also easily reproducible using the including unit test ScpTest.testStreamsUploadAndDownload
. Running it you can observe the 5 second exit timeout being hit.
Moving the handleCommandExitStatus
to after the I/O stream try-with-resources resolves the timeout but introduces a new race-condition with the channels not shutting down gracefully.
I am still not sure why only upload
is susceptible to this and not runUpload
based on their slightly different use of the ScpHelper
.
You can simulate the issue by introducing a 200ms delay between the IO streams and the channel closing. Hopefully this information is helpful.
ClientSession session = getClientSession();
ChannelExec channel = openCommandChannel(session, cmd);
// Wrapped with an additional try/finally.
try {
try (InputStream invOut = channel.getInvertedOut();
OutputStream invIn = channel.getInvertedIn()) {
// NOTE: we use a mock file system since we expect no invocations for it
ScpHelper helper = new ScpHelper(session, invOut, invIn, new MockFileSystem(remote), opener, listener);
Path mockPath = new MockPath(remote);
helper.sendStream(new DefaultScpStreamResolver(name, mockPath, perms, time, size, local, cmd),
options.contains(Option.PreserveAttributes), ScpHelper.DEFAULT_SEND_BUFFER_SIZE);
// handleCommandExitStatus(cmd, channel);
}
Thread.sleep(200);
// handleCommandExitStatus(cmd, channel);
} catch (InterruptedException e) {
throw new RuntimeException(e);
} finally {
channel.close(false);
}
16:45:28.569 | DEBUG | main | o.a.s.s.c.ScpHelper | org.apache.sshd.scp.common.ScpHelper 603 | sendStream(ScpHelper[ClientSessionImpl[proxy@localhost/127.0.0.1:2222]])[scp -t -- upload.txt] command='C0644 63 upload.txt' ACK=0
16:45:28.569 | DEBUG | main | o.a.s.c.c.ChannelExec | org.apache.sshd.common.channel.AbstractChannel 1005 | sendEof(ChannelExec[id=0, recipient=0]-ClientSessionImpl[proxy@localhost/127.0.0.1:2222]) SSH_MSG_CHANNEL_EOF (state=Opened)
16:45:28.569 | DEBUG | main | o.a.s.c.s.ClientSessionImpl | org.apache.sshd.common.session.helpers.AbstractSession 1441 | encode(ClientSessionImpl[proxy@localhost/127.0.0.1:2222]) packet #8 sending command=96[SSH_MSG_CHANNEL_EOF] len=5
16:45:28.570 | DEBUG | main | o.a.s.c.i.n.Nio2Session | org.apache.sshd.common.io.nio2.Nio2Session 179 | writeBuffer(Nio2Session[local=/127.0.0.1:51986, remote=localhost/127.0.0.1:2222]) writing 36 bytes
16:45:28.570 | DEBUG | b]-nio2-thread-6 | o.a.s.c.s.ClientSessionImpl | org.apache.sshd.common.session.helpers.AbstractSession 566 | doHandleMessage(ClientSessionImpl[proxy@localhost/127.0.0.1:2222]) process #10 SSH_MSG_CHANNEL_DATA
16:45:28.570 | DEBUG | b]-nio2-thread-6 | o.a.s.c.c.LocalWindow | org.apache.sshd.common.channel.LocalWindow 85 | Consume LocalWindow[client](ChannelExec[id=0, recipient=0]-ClientSessionImpl[proxy@localhost/127.0.0.1:2222]) by 1 down to 2097149
16:45:28.570 | DEBUG | b]-nio2-thread-6 | o.a.s.c.c.ChannelExec | org.apache.sshd.common.channel.AbstractChannel 829 | handleData(ChannelExec[id=0, recipient=0]-ClientSessionImpl[proxy@localhost/127.0.0.1:2222]) SSH_MSG_CHANNEL_DATA len=1
16:45:28.570 | DEBUG | b]-nio2-thread-6 | o.a.s.c.i.n.Nio2Session | org.apache.sshd.common.util.logging.LoggingUtils 568 | handleReadCycleFailure(Nio2Session[local=/127.0.0.1:51986, remote=localhost/127.0.0.1:2222]) IOException after 1161791 nanos at read cycle=13: Pipe closed
16:45:28.570 | DEBUG | b]-nio2-thread-6 | o.a.s.c.i.n.Nio2Session | org.apache.sshd.common.io.nio2.Nio2Session 205 | exceptionCaught(Nio2Session[local=/127.0.0.1:51986, remote=localhost/127.0.0.1:2222]) caught IOException[Pipe closed] - calling handler
16:45:28.570 | DEBUG | b]-nio2-thread-6 | o.a.s.c.s.ClientSessionImpl | org.apache.sshd.client.session.ClientSessionImpl 200 | signalAuthFailure(ClientSessionImpl[proxy@localhost/127.0.0.1:2222]) type=IOException, signalled=false, first=false: Pipe closed
16:45:28.572 | WARN | b]-nio2-thread-6 | o.a.s.c.s.ClientSessionImpl | org.apache.sshd.common.util.logging.LoggingUtils 616 | exceptionCaught(ClientSessionImpl[proxy@localhost/127.0.0.1:2222])[state=Opened] IOException: Pipe closed
java.io.IOException: Pipe closed
at org.apache.sshd.common.channel.ChannelPipedInputStream.receive(ChannelPipedInputStream.java:196)
at org.apache.sshd.common.channel.ChannelPipedOutputStream.write(ChannelPipedOutputStream.java:53)
at org.apache.sshd.client.channel.AbstractClientChannel.doWriteData(AbstractClientChannel.java:442)
at org.apache.sshd.common.channel.AbstractChannel.handleData(AbstractChannel.java:839)
With some further testing this appears to happen if we trigger the sending of an EOF too quickly after sending a stream. Placing a 100ms delay between these actions alleviates the issue and further demonstrates the race condition.
try {
try (InputStream invOut = channel.getInvertedOut();
OutputStream invIn = channel.getInvertedIn()) {
// NOTE: we use a mock file system since we expect no invocations for it
ScpHelper helper = new ScpHelper(session, invOut, invIn, new MockFileSystem(remote), opener, listener);
Path mockPath = new MockPath(remote);
helper.sendStream(new DefaultScpStreamResolver(name, mockPath, perms, time, size, local, cmd),
options.contains(Option.PreserveAttributes), ScpHelper.DEFAULT_SEND_BUFFER_SIZE);
Thread.sleep(100);
}
handleCommandExitStatus(cmd, channel);
} catch (InterruptedException e) {
throw new RuntimeException(e);
} finally {
channel.close(false);
}
It appears that the server is responding with an unexpected additional ACK after sending the payload stream and this message is racing with the channel closure.
09:42:37.896 | DEBUG | b]-nio2-thread-4 | o.a.s.c.c.LocalWindow | org.apache.sshd.common.channel.LocalWindow 85 | Consume LocalWindow[client](ChannelExec[id=0, recipient=0]-ClientSessionImpl[proxy@localhost/127.0.0.1:2222]) by 1 down to 2097150
09:42:37.896 | DEBUG | b]-nio2-thread-4 | o.a.s.c.c.ChannelExec | org.apache.sshd.common.channel.AbstractChannel 829 | handleData(ChannelExec[id=0, recipient=0]-ClientSessionImpl[proxy@localhost/127.0.0.1:2222]) SSH_MSG_CHANNEL_DATA len=1
09:42:37.896 | DEBUG | main | o.a.s.s.c.ScpHelper | org.apache.sshd.scp.common.ScpHelper 603 | sendStream(ScpHelper[ClientSessionImpl[proxy@localhost/127.0.0.1:2222]])[scp -t -- upload.txt] command='C0644 63 upload.txt' ACK=0
09:42:37.896 | DEBUG | b]-nio2-thread-5 | o.a.s.c.s.ClientSessionImpl | org.apache.sshd.common.session.helpers.AbstractSession 566 | doHandleMessage(ClientSessionImpl[proxy@localhost/127.0.0.1:2222]) process #10 SSH_MSG_CHANNEL_DATA
09:42:37.896 | DEBUG | b]-nio2-thread-5 | o.a.s.c.c.LocalWindow | org.apache.sshd.common.channel.LocalWindow 85 | Consume LocalWindow[client](ChannelExec[id=0, recipient=0]-ClientSessionImpl[proxy@localhost/127.0.0.1:2222]) by 1 down to 2097149
09:42:37.896 | DEBUG | b]-nio2-thread-5 | o.a.s.c.c.ChannelExec | org.apache.sshd.common.channel.AbstractChannel 829 | handleData(ChannelExec[id=0, recipient=0]-ClientSessionImpl[proxy@localhost/127.0.0.1:2222]) SSH_MSG_CHANNEL_DATA len=1
09:42:37.896 | DEBUG | main | o.a.s.c.c.ChannelExec | org.apache.sshd.common.channel.AbstractChannel 1005 | sendEof(ChannelExec[id=0, recipient=0]-ClientSessionImpl[proxy@localhost/127.0.0.1:2222]) SSH_MSG_CHANNEL_EOF (state=Opened)
It appears that this is caused by an initial ACK not being processed when the channel is opened, then the following sequence of ACKs are not processed in the correct order, which leaves a trailing ACK at the end to race with channel closure.
runUpload
is different here because it passes ScpHelper.send
as an executor method which reads this first initial ACK immediately at line ScpHelper:414 before calling ScpHelper.sendStream
.
I will submit a PR, though I am not sure of the protocol here so please ensure this fix would not be server dependent.
Version
2.11.0
Bug description
When attempting to upload/scp a byte stream using the below method, the main thread will always block until the exit status timeout is reached.
DefaultScpClient::upload(InputStream local, String remote, long size,Collection<PosixFilePermission> perms, ScpTimestampCommandDetails time)
It appears this is the case because the handleCommandExitStatus(cmd, channel) is called before the In/Out channels are auto-closed by the try-with-resources block, and thus the EOF is not sent to the remote prior to handling the exit status. The current implementation always times out first and then sends an EOF while closing the channels.
Using the API to send based on file paths will invoke DefaultScpClient::runUpload() which first closes the channel resources and then waits for the exit status. This API works well without the exit status blocking the thread.
I have overridden the default client to ignore the exit status result, and there is a dedicated property to adjust this timeout. Though, this is not really a solution to this issue.
This issue is persistent on remote SSH servers running with CentOS and MacOS. This example is being run on MacOS 13.3.1 with a remote SSH server that has no explicit SSH configuration. Testing on sshd-scp 2.10.0 and 2.11.0.
Overriding the client to do the following seems to resolve the issue:
Actual behavior
The upload method blocked the main thread after byte transfer for 5 seconds.
Expected behavior
The upload SCP routine should receive an exit status immediately after byte array upload.
Relevant log output
Other information
No response