IBM / aspera-connect-sdk-js

Official Aspera Connect SDK for Javascript
https://ibm.github.io/aspera-connect-sdk-js/
Apache License 2.0
10 stars 2 forks source link

Issue reading file using readChunkAsArrayBuffer #49

Closed rgodfreyDisco closed 9 months ago

rgodfreyDisco commented 1 year ago

I'm facing an issue where the readChunkAsArrayBufferdoes not seem to output the expected values. I have an application that is using the Aspera Drag and Drop utility to drop files from the user's local machine. To test the issue I used an html input tag and looped through the file using a normal JS File object and compared chunks of the JS file to the Aspera file. I found that the two blobs are equal until about half way through the file. After that the blobs from the file.splice method and readChunkAsArrayBufferare no longer equal. It may be important to note that the file is around 5GB in size.

Here's the sample code I used to compare the two files

import {b64toBlob} from "../mediaconch/b64toBlob";
import {asperaWeb} from "../Aspera/AsperaConnect";

function compareFiles(asperaDrapAndDropFile) {
    const CHUNK_SIZE = 1024 * 1024;
    const fileSize = asperaDrapAndDropFile.size;

    const htmlFileInput = document.getElementById('file-input').files[0];

    for (let i = 0; i < asperaDrapAndDropFile.size; i += CHUNK_SIZE) {
        let readSize;

        if (fileSize < i + CHUNK_SIZE) {
            readSize = fileSize - i;
        } else {
            readSize = CHUNK_SIZE;
        }

        const asperaArrayBuffer = await asperaWeb.readChunkAsArrayBuffer(
            {
                path: asperaDrapAndDropFile.name,
                offset: i,
                chunkSize: readSize,
            }
        );

        const asperaBlob = b64toBlob(
            asperaArrayBuffer.data,
            asperaArrayBuffer.type
        );

        const htmlInputBlob = htmlFileInput.slice(i, i + readSize);

        blobsAreEqual(asperaBlob, htmlInputBlob).then(result => {
            if (result) {
                console.log('The loop blobs are equal');
            } else {
                console.log('The loop blobs are not equal');
                console.log('i = ' + i);
                console.log('readSize = ' + readSize);
            }
        });
    }

    function blobsAreEqual(blob1: Blob, blob2: Blob) {
        if (blob1.size !== blob2.size) {
            return new Promise((resolve, reject) => {
                resolve(false); // Blobs are not equal.
                return;
            });
        }

        return new Promise((resolve, reject) => {
            const reader1 = new FileReader();
            const reader2 = new FileReader();

            reader1.onloadend = () => {
                const buffer1 = new Uint8Array(reader1.result);

                reader2.onloadend = () => {
                    const buffer2 = new Uint8Array(reader2.result);

                    for (let i = 0; i < blob1.size; i++) {
                        if (buffer1[i] !== buffer2[i]) {
                            resolve(false); // Blobs are not equal.
                            return;
                        }
                    }
                    resolve(true); // Blobs are equal.
                };

                reader2.readAsArrayBuffer(blob2);
            };

            reader1.readAsArrayBuffer(blob1);
        });
    }
}
rgodfreyDisco commented 1 year ago

Just wanted to bump this to see if anyone has started looking into this issue? Is there any way we can get this escalated & prioritized?

dwosk commented 1 year ago

@rgodfreyDisco sorry for the delay - we are looking into it. So that we are looking at the same things, can you confirm which OS, browser, and Connect version you are using for your tests?

rgodfreyDisco commented 1 year ago

@dwosk We're using "@ibm-aspera/connect-sdk-js": "5.0.0" in our application. I'm on windows 10, chrome version 115.0.5790.173. My coworker is on macOS Ventura 13.5.1 running chrome 116.0.5845.96

dwosk commented 1 year ago

With macOS Ventura, Aspera Connect v4.2.6 (client version), and latest Chrome, I'm seeing the same issue. When comparing blobs of a 5GB file, the blobs are identical up until the same 4GB offset:

The loop blobs are not equal
i = 4294967296
readSize = 1048576

If I re-run the test multiple times it consistently fails at the same offset. Even if I start the test at that offset, the blobs are immediately different.

4294967296 corresponds to 2^32 which is actually above the maximum size for the integer type we use to parse the offset parameter. Attempting to read a chunk starting at an offset of 4GB results in an overflow, which explains why the blobs are suddenly different.

We'll have to change the type that we parse the offset parameter into in the backend in order to support higher offsets.

However, iteratively comparing blobs of an entire 5GB file takes a considerable amount of time - is this the intended behavior for your integration or is this simply test code? Other than this issue, the blobs do seem to be correct. What is the feature you are trying to implement?

rgodfreyDisco commented 1 year ago

Ah that makes sense. Thanks for the feedback David. The code I attached was just sample code I used to debug the issue, we're not reading and comparing the entire file. We are using another library that jumps around the file and reads it in at different points. It ends up reading a couple chunks toward the end of the file which is causing us issues.

What would the timeline look like to get this fixed?

dwosk commented 1 year ago

It will be fixed in the next release of Aspera Connect, which is currently slated to be v4.2.7. Unfortunately, I cannot give a timeline as to when it will be released though.

The code I attached was just sample code I used to debug the issue, we're not reading and comparing the entire file. We are using another library that jumps around the file and reads it in at different points. It ends up reading a couple chunks toward the end of the file which is causing us issues.

Ok that is what I thought. As a workaround, can you instruct the library to not read chunks after that offset?

rgodfreyDisco commented 1 year ago

Great, glad to hear it will be fixed in the next version.

Unfortunately we need to read chunks toward the end to analyze the files correctly.

The workaround I have implemented is to bubble the drop event up to a wrapper div and then parse through the files using the js File.slice() function. But there are still restrictions to this workaround since we also provide a way to use a file selector to select files. In that scenario there is no event to bubble up.

If there was a way to create a normal js File from the Aspera File object (and visa versa) then I could implement a workaround that covers all scenarios

dwosk commented 1 year ago

Yeah, that makes sense. In the drag and drop use case, you have access to the native drop event and can use the File API to read the chunks, which is what it sounds like you are doing.

But there are still restrictions to this workaround since we also provide a way to use a file selector to select files. In that scenario there is no event to bubble up.

If there was a way to create a normal js File from the Aspera File object (and visa versa) then I could implement a workaround that covers all scenarios

Unfortunately, for security reasons, I do not think the browser will allow something like this - though I agree this would be useful.

dwosk commented 11 months ago

@rgodfreyDisco FYI Connect 4.2.7 has been released which contains the fix for this issue.

rgodfreyDisco commented 11 months ago

Great news! Thanks for the heads up @dwosk

rgodfreyDisco commented 10 months ago

@dwosk Is there a way to make sure users have a specific version of Connect?

dwosk commented 10 months ago

It is possible to enforce a minimum version of Connect with the minVersion option documented here.

I believe this will result in an OUTDATED Connect status event if the user's Connect is lower than this specified minimum version. If your Aspera integration is using the default Connect installer UI included with the SDK then this should open a modal with download links to the latest Connect installers. If your integration has its own Connect installer UI then it will be up to your application to handle this OUTDATED event appropriately and do something similar.

rgodfreyDisco commented 10 months ago

Is that the correct link? I don't see a minVersion option there

dwosk commented 10 months ago

@rgodfreyDisco sorry I've updated the link.

rgodfreyDisco commented 10 months ago

Perfect! Thanks again for your help!