DICE-UNC / jargon

Jargon core libraries
Other
28 stars 31 forks source link

Slow upload speed (most likely my fault) #439

Closed jjkoehorst closed 10 months ago

jjkoehorst commented 10 months ago

I am working on a buffered upload stream to iRODS due to a web upload field I cannot use the standard file upload functionality in jargon.


    public static void copyWithProgress(File fileName, InputStream input, OutputStream output, ProgressListener listener) throws IOException {
        byte[] buffer = new byte[32768]; // Buffer size
        int nRead;
        long totalBytesRead = 0;
        long lastTotalBytesReadForProgress = 0;

        while ((nRead = input.read(buffer)) != -1) {
            output.write(buffer, 0, nRead);
            totalBytesRead += nRead;

            // Check if at least 5 MB has been read since the last update
            if (totalBytesRead - lastTotalBytesReadForProgress >= 1024 * 1024 * 5) {
                listener.update(totalBytesRead);
                lastTotalBytesReadForProgress = totalBytesRead;
            }
        }

        // Final update to ensure listener gets the complete file size at the end
        if (totalBytesRead != lastTotalBytesReadForProgress) {
            listener.update(totalBytesRead);
        }
    }
}```

The following code works and shows a progress message for the sake of development (likely to be removed later). When the Outputstream is a local stream it is obviously very fast but when I switch to a

`IRODSFileOutputStream irodsFileOutputStream = credentials.getFileFactory().instanceIRODSFileOutputStream(pathBar.getValue() + "/" + fileName);`

it is easily < 100kb/s... what would be the best way to make use of the full internet connection?
korydraughn commented 10 months ago

Jargon supports parallel transfer over port 1247. NFSRODS uses that functionality to stream chunks of data into a data object across multiple threads. See the following for an example:

The key here is the coordinated variable.


If Java isn't a hard requirement, you could give the iRODS HTTP API a try. It makes writing into data objects and replicas pretty simple. However, the iRODS HTTP API is still in active development so things can change as we approach v1.0.0. The following is a slide which demonstrates how to perform a parallel transfer using the HTTP API:


Does that answer your question? If not, please explain how your application handles the opening, writing, and closing of the data object.

korydraughn commented 10 months ago

In general, the fastest way to upload data into iRODS is to use parallel transfer.

That means, use multiple threads, each with their own connection to the iRODS server. Each thread opens the data object at different offsets and writes a chunk of data to that location as fast as possible.

Each iRODS connection will result in a new agent. All of this requires coordination. Jargon makes this pretty simple compared to other solutions.

jjkoehorst commented 10 months ago

Thanks for the quick reply! The application is written in Java but I am more than happy to call an api or dive into parallel transfer. You have given me plenty ideas I will see how I can implement this!

korydraughn commented 10 months ago

That sounds great. I'm going to close this issue. If you run into problems, feel free to open a new issue or re-open this one.

Good luck!