aws / aws-sdk-java

The official AWS SDK for Java 1.x (In Maintenance Mode, End-of-Life on 12/31/2025). The AWS SDK for Java 2.x is available here: https://github.com/aws/aws-sdk-java-v2/
https://aws.amazon.com/sdkforjava
Apache License 2.0
4.13k stars 2.83k forks source link

waitForCompletion stops waiting on first error #2997

Closed MrFlick closed 1 year ago

MrFlick commented 1 year ago

Describe the bug

When using transferManager.uploadFileList with waitForCompletion, it seems that when a failure occurs, an exception is thrown, but the sub-transfers are still listed as "InProgress". Thus not all transfers are complete. From what I'm observing, it seems that the files will still upload. I'm trying to figure out which files were successful and which failed.

Expected Behavior

I would expect all transfers to be "Complete" or "Failed" or "Canceled". I don't understand why they are still "InProgress". Otherwise how do you know when all transfers are done.

Current Behavior

In my current case, I'm uploading ~10,000 files. I'm getting an SdkClientException with the message "The Content-MD5 you specified did not match what we received". However it seems that nearly all the files are being uploaded. However, one must be triggering the error but it seems to be impossible to know which files transferred successfully and which failed. I thought by looking at the sub-transfers I would be able to tell, but in catch for the exception, when iterating the sub-transfers, nearly all of them are still listed as "InProgress".

Reproduction Steps

The code I'm using looks like this

    List<File> files = s3Uploads.parallelStream().map(upload -> upload.path.toFile()).collect(Collectors.toList());
    MultipleFileUpload fileUpload = transferManager.uploadFileList(
            "my_bucket,
            "new_folder/",
            rootPath.toFile(),
            files
    );
    try {
        // Upload files
        fileUpload.waitForCompletion();
        logger.info("Done");
    }
    catch (SdkClientException e) {
            for (Upload transfer: fileUpload.getSubTransfers()) {
                logger.error(transfer.getState().toString() + " - " + transfer.getDescription());
            }
    }

In my case, when I see an exception and the first file is listed as "Failed" but all the other files are listed as "InProgress". From what I can tell they are still being uploaded to S3.

In my case one of the files is throwing an Exception for "The Content-MD5 you specified did not match what we received". I don't know how to re-create that type of error in a reproducible way.

Possible Solution

Have a way to wait till there are no files in progress and all transfers have resolved in some way.

It would also he helpful to describe in the documentation exactly what happens when one or more files in the upload file list fails. It was unclear to me what the expected behavior is supposed to be in that case and the best way for the application to recover.

Additional Information/Context

No response

AWS Java SDK version used

1.12.498

JDK version used

openjdk version "11.0.10" 2021-01-19

Operating System and version

Windows 11

debora-ito commented 1 year ago

@MrFlick I believe you also open an internal case with this question, I'll share the response here for posterity.

The Developer Guide has some code examples showing how to monitor the progress of each subtransfer - https://docs.aws.amazon.com/sdk-for-java/v1/developer-guide/examples-s3-transfermanager.html#transfermanager-get-status-and-progress

For the "Content-MD5 you specified did not match what we received" exception, one possible cause is that the file content changed between the time it was added to the upload list and the time the upload effectively started. Searching for the error message on StackOverflow you can find other reports of the error and different root causes - example: https://stackoverflow.com/questions/36179310/an-exception-the-content-md5-you-specified-did-not-match-what-we-received