apache / mina-sshd

Apache MINA sshd is a comprehensive Java library for client- and server-side SSH.
https://mina.apache.org/sshd-project/
Apache License 2.0
847 stars 353 forks source link

Perfomance file transfer #485

Open Holger-Benz opened 2 months ago

Holger-Benz commented 2 months ago

Version

2.12.1

Bug description

Dear apache support team,

we are switching our communication software from the JSCHED sftp library to the apache-mina library.

We realized that the apache mina library does not reach the performance of the JSCHED library.

I have written a small program to send a 700 MB file using the SFTP protocol.

This file transfer is about 6 times slower than a file transfer with the JSCHED library.

How can we increase the transfer speed?

Are we not using the apache-mina library correctly?

public static void sendFile() throws IOException { SshClient client = SshClient.setUpDefaultClient(); client.start(); try (ClientSession session = client.connect("user", "host", 1022).verify().getClientSession();) { session.addPasswordIdentity("password"); session.auth().verify(); SftpClient sftpClient = SftpClientFactory.instance().createSftpClient(session); String largeFile = "c:/temp/largeFile"; long length = new File(largeFile).length(); try (FileChannel writeableChannel = sftpClient.openRemoteFileChannel("largeFile", SftpClient.OpenMode.Create, SftpClient.OpenMode.Truncate, SftpClient.OpenMode.Write); FileChannel readableChannel = FileChannel.open(new File(largeFile).toPath(), StandardOpenOption.READ)) { readableChannel.transferTo(0, length, writeableChannel);

        }
    }
}

Actual behavior

The apache mina library does not reach the performance of other sftp-libraries

Expected behavior

Is it possible to increase the perfomance?

Relevant log output

No response

Other information

No response

tomaswolf commented 2 months ago

Thank you for this test case. It appears that there is indeed something wrong with the FileChannels. The following is in my tests much faster (and on par with OpenSSH or Jsch):

SftpClient sftpClient = SftpClientFactory.instance().createSftpClient(session);
try (OutputStream out = sftpClient.write("largeFile")) {
    Files.copy(new File(largeFile).toPath(), out);
}

or also

try (SftpFileSystem fs = SftpClientFactory.instance().createSftpFileSystem(session)) {
  Path remoteFile = fs.getPath("largeFile");
  Files.copy(new File(largeFile).toPath(), remoteFile, StandardCopyOption.REPLACE_EXISTING);
}

With the channels and transferTo I see uploads (to localhost, so no network latency) about 4 times (400%) slower, and downloads about 25% slower. We'll have to investigate what's going on there...

What is the JSCHED library?

tomaswolf commented 2 months ago

Interesting: if you change in your code

readableChannel.transferTo(0, length, writeableChannel);

to

writeableChannel.transferFrom(readableChannel, 0, length);

it will also run much faster (but still 25% slower than the two versions with Files.copy() I posted).

Off-topic note: you should probably also check the return value of transferTo/transferFrom and execute them in a loop until everything is transferred.

tomaswolf commented 2 months ago

After some analysis, here's what's going on:

transferTo/transferFrom, as well as the FileChannel.write() operations, are positional operations. readableChannel.transferTo(0, length, writeableChannel) will essentially read 8kB ByteBuffers from the file and then call writeableChannel.write() for each buffer.

However, SftpRemotePathChannel.write() doesn't know that it is being called essentially for a sequential copy operation, and so it doesn't employ a number of optimizations. The result is the slow transfer.

If you change the logic and use writeableChannel.transferFrom(), then the SftpRemotePathChannel drives the operation, and it knows that it is going to sequentially read buffers. Hence it can employ these optimizations.

When you use OutputStream/InputStream as in my Files.copy() examples, then it is known that a sequential data transfer occurs, and the SFTP implementation can employ its optimizations unconditionally.

Finally, transferTo/transferFrom by default copy data in 8kB chunks. With streams, the chunks are about 32kB. This difference causes the 25% slowdown.

Hence:

It might be possible to improve our implementation to handle the case you stumbled upon better, but I'm not sure yet.

kvlnkarthik commented 2 months ago

We see same issue in our tests as well. We are using 2.12.1 version.

We executed filetransfer test case using Files.copy() approach for transferring a file of about 167Mb to a remote server and it took around 30seconds.

If we transfer the same file from same system to the same remote server but with sftp session created with below commands, it takes around 6 minutes to complete the transfer. Performance is very much degraded in this scenario.

" sftp -P 2022 @localhost" put file /tmp/

We run the SSHD server with sftp subsystem and a custom FileSystemFactory which creates a remote sftp filesystem. Remote sftp filesystem is created using below code.

URI sftpUri = SftpFileSystemProvider.createFileSystemURI(sshConnectionDetails.getHostname(), sshConnectionDetails.getSshPort(), sshConnectionDetails.getUsername(), sshConnectionDetails.getPassword());

Apache Mina code runs on localhost 2022 but creates a remote filesystem. So, when we execute the put file /tmp/, the file gets transferred from our local system to remote server. i.e., client -> apache mina server -> remote server. We acknowledge that there is an additional hop here, i.e., the file needs to be transferred to server and then to remote server but the transfer rate is way too slow.

We see SftpRemotePathChannel.write method invocations during this mode of transfer in the thread dump. Based on our tests and your explanation in previous comments, this mode of transfer seems to be very slow.

Stack trace:

"sshd-SftpSubsystem-47114-thread-1" #35 daemon prio=5 os_prio=0 tid=0x00007f7cf4070800 nid=0x18e88 in Object.wait() [0x00007f7d30ffc000] java.lang.Thread.State: TIMED_WAITING (on object monitor) at java.lang.Object.wait(Native Method) at java.lang.Object.wait(Object.java:460) at org.apache.sshd.sftp.client.impl.DefaultSftpClient.receive(DefaultSftpClient.java:351)

Is there any way to force/override the file upload/download in sftp sessions through put/get commands to use Files.copy() way in order to see better performance or tune buffers in SftpRemotePathChannel. Could you please let us know.

Also observed that if we simply use sftp openssh client to the remote server directly without going through our Apache Mina Sftp server code, it takes only 3seconds to transfer the same file.

Holger-Benz commented 2 months ago

Von: kvlnkarthik @.> Gesendet: Mittwoch, 17. April 2024 19:28 An: apache/mina-sshd @.> Cc: Holger Benz @.>; Author @.> Betreff: Re: [apache/mina-sshd] Perfomance file transfer (Issue #485)

We see same issue in our tests as well. We are using 2.12.1 version.

We executed filetransfer test case using Files.copy() approach for transferring a file of about 167Mb to a remote server and took it around 30seconds.

If we transfer the same file from same system to the same remote server but with sftp session created with below commands, it takes around 6 minutes to complete the transfer. Performance is very much degraded in this scenario.

" sftp -P 2022 @localhosthttps://github.com/localhost" put /tmp/

We run the SSHD server with sftp subsystem and a custom FileSystemFactory which creates a remote sftp filesystem created using below code in localhost and initiate the above sftp commands from command line terminal.

URI sftpUri = SftpFileSystemProvider.createFileSystemURI(sshConnectionDetails.getHostname(), sshConnectionDetails.getSshPort(), sshConnectionDetails.getUsername(), sshConnectionDetails.getPassword());

Apache Mina code runs on localhost 2022 but creates a remote filesystem. So, when we execute the put /tmp/, the file gets transferred from our local system to remote server. i.e., client -> apache mina server -> remote server. We acknowledge that there is an additional hop here, i.e., the file needs to be transferred to server and then to remote server but the transfer rate is way too slow.

We see SftpRemotePathChannel.write method invocations during this mode of transfer in the thread dump. Based on our tests and your explanation in previous comments, this mode of transfer seems to be very slow.

Stack trace:

"sshd-SftpSubsystem-47114-thread-1" #35https://github.com/apache/mina-sshd/pull/35 daemon prio=5 os_prio=0 tid=0x00007f7cf4070800 nid=0x18e88 in Object.wait() [0x00007f7d30ffc000] java.lang.Thread.State: TIMED_WAITING (on object monitor) at java.lang.Object.wait(Native Method) at java.lang.Object.wait(Object.java:460) at org.apache.sshd.sftp.client.impl.DefaultSftpClient.receive(DefaultSftpClient.java:351)

Is there any way to force/override the file upload/download in sftp sessions through put/get commands to use Files.copy() way in order to see better performance or tune buffers in SftpRemotePathChannel. Could you please let me know.

Also observed that if we simply use sftp openssh client to the remote server directly without going through our Apache Mina Sftp server code, it takes only 3seconds to transfer the same file.

- Reply to this email directly, view it on GitHubhttps://github.com/apache/mina-sshd/issues/485#issuecomment-2061827392, or unsubscribehttps://github.com/notifications/unsubscribe-auth/BHWMB3RPHM74OQYHFLETN6LY52WJBAVCNFSM6AAAAABGJSS542VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANRRHAZDOMZZGI. You are receiving this because you authored the thread.Message ID: @.**@.>>

kvlnkarthik commented 2 months ago

@tomaswolf , Any thoughts on my above comment especially on """ Is there any way to force/override the file upload/download in sftp sessions through put/get commands to use Files.copy() way in order to see better performance or tune buffers in SftpRemotePathChannel. Could you please let me know. """ Thanks, Karthik

Holger-Benz commented 2 months ago

The SSHD-server is also integrated in our communication software.

After updating the server from version 2.4.0 to 2.12.1, the communication of the server has become significantly slower (factor 2).

Is there any way to improve the perfomance?

tomaswolf commented 2 months ago

The SSHD-server is also integrated in our communication software.

The original report was about the client side. Whatever this may be, it would be a new separate issue. But unless you have more information we can't do anything anyway. Best bet to track it down might be to run with debug logging, once against the old version and once against the new version. Maybe that gives some hints. Also monitor resource consumption (memory etc) on the server side in both cases, and look for differences.

tomaswolf commented 2 months ago

Any thoughts on my above comment especially on """ Is there any way to force/override the file upload/download in sftp sessions through put/get commands to use Files.copy() way in order to see better performance or tune buffers in SftpRemotePathChannel. Could you please let me know. """

I don't think so. If I understand it right, your problem is in a server acting as a kind of SFTP proxy. That intermediary server does not see put/get commands, it only sees positional write/read requests.

Holger-Benz commented 2 months ago

I'm sorry, you're right. We will open a new issue when we have the relevant debug data.

benz-ppi commented 1 month ago

Even with the changes you have suggested, transferring files with the SFTP Apache client software is significantly slower (> factor 3) than transferring files with jsched or winscp.

Do you intend to improve the performance of the SFTP Apache client software?

tomaswolf commented 1 month ago

So far I have not enough information to do anything. I have run my own speed tests, and I see no performance problem. Before I can do anything I need to be able to reproduce the problem that you observe.

I would need detailed information about your setup: your client-side code, your test setup, what authentication mechanisms and ciphers are used, what's the size of the files, what Java version do you use, what hardware is your client running on, what server are you testing against and on what hardware or virtual machine or container is it running, what is the network latency, what buffer sizes are used, which of the I/O back-ends in Apache MINA SSHD are you using (NIO2, MINA, Netty?), and what is that "jsched" client that you keep mentioning? I have never heard of that.