aws / aws-sdk

Landing page for the AWS SDKs on GitHub
https://aws.amazon.com/tools/
Other
68 stars 13 forks source link

S3 DeleteObjects API returns `MalformedXML...` error when trying to delete >1000 objects #602

Open sdarwin opened 9 months ago

sdarwin commented 9 months ago

Describe the feature

Improve the accuracy of the error messages during certain failures.

Use Case

Clear error messages assist in debugging.

Here is the problem I encountered. When running this command to delete objects in S3:

aws s3api delete-objects --profile $PROFILE --bucket $BUCKET --delete file://list.json

It supposedly has a limit of 1000 items. If you provide a million items to delete, the error messages are mysterious:

SSL validation failed for https://s3.amazonaws.com/_my_bucket_?delete EOF occurred in violation of protocol (_ssl.c:2427)

or

MalformedXML: The XML you provided was not well-formed or did not validate against our published schema

As evidence for MalformedXML error, consider this stackexchange article

"+1 for number of keys. I did not check the go code but for us in python boto3 over 1000 keys in the request caused the error. You would think they would document this is the documentation for MalformedXML on the sdk client, but you did find api documentation. We chunked the keys and it works perfect – vfrank66"

Proposed Solution

The problem is that neither message explains the actual cause of the problem, which is the input is over 1000 items in length. The errors should be more specific about the real reason.

Other Information

No response

Acknowledgements

CLI version used

aws-cli/2.13.17 Python/3.11.5 Linux/6.2.0-1013-gcp exe/x86_64.ubuntu.22 prompt/off

Environment details (OS name and version, etc.)

Ubuntu 22.04

tim-finnigan commented 9 months ago

Hi @sdarwin thanks for reaching out. Here is the delete-objects documentation for reference: https://awscli.amazonaws.com/v2/documentation/api/latest/reference/s3api/delete-objects.html.

The 1000 limit you referenced is mentioned there:

The request contains a list of up to 1000 keys that you want to delete

I could reproduce the MalformedXML error using a file containing 1001 keys generated by this script:

import json

data = {
    "Objects": [
        {
            "Key":f"Key_{i}",
        } for i in range(1, 1002)
    ],
    "Quiet": True
}

with open('data.json', 'w') as outfile:
    json.dump(data, outfile, indent=4)

aws s3api delete-objects --bucket $BUCKET --delete file://data.json resulted in:

An error occurred (MalformedXML) when calling the DeleteObjects operation: The XML you provided was not well-formed or did not validate against our published schema

(The SSL validation failed... error could have occurred for various reasons. Those errors are often caused by certificate issues as noted in the troubleshooting guide. But if it was something that you just encountered once then it may have been some kind of transient network/proxy issue.)


I also think that the error message could be improved. That error message is returned by the S3 DeleteObjects API, so we'll need to reach out to the S3 team to request a better error message. I'll go ahead and transfer this to our cross-SDK repository for further tracking as service APIs like this are used across SDKs. I'll also note a few related issues I found:

tim-finnigan commented 9 months ago

P99874682

sdarwin commented 9 months ago

The SSL validation failed... if it was something that you just encountered once

Just retesting now.

Ok, there is a different reason for the SSL error. It isn't the 1000 item limit, which I had thought.

SSL validation failed for https://s3.amazonaws.com/_bucket_?delete EOF occurred in violation of protocol (_ssl.c:2427)

That is caused by not specifying the region. Notice how https://s3.amazonaws.com/ seems to be a generic endpoint, without mentioning a region.

If I add --region us-east-2 to the command, it's fixed.

Well, that is another one to consider then? The user does not see a message such as "You should set the region." Simply a unknown SSL error.

tim-finnigan commented 9 months ago

The SSL validation error doesn't seem directly related to S3. A few other issues have reported that error and it could be caused by various things. If you add --debug to your command then it may provide more insight, but if it's not a certificate or proxy issue then it could be an issue with whichever version of urllib3 or OpenSSL you have installed. I can't reproduce the error just by not providing a region.

sdarwin commented 9 months ago

This is a going off track now. :-) I will post the info, just to be complete, anyway.

Collecting a list of items, that will later be deleted:

    aws s3api list-object-versions --max-items 1002  --profile $PROFILE \
        --bucket $BUCKET \
        --output=json \
        --query='{Objects: Versions[].{Key:Key,VersionId:VersionId}}' --prefix $PREFIX > list.json

Because that is over 1000, it causes an error. In this case, (MalformedXML)

What if the number of items is larger, 10,000 instead?

    aws s3api list-object-versions --max-items 10000  --profile $PROFILE \
        --bucket $BUCKET \
        --output=json \
        --query='{Objects: Versions[].{Key:Key,VersionId:VersionId}}' --prefix $PREFIX > list.json

Now, it seems that a combination of all these factors:

Gets this error:

SSL validation failed for https://s3.amazonaws.com/_bucket_?delete EOF occurred in violation of protocol (_ssl.c:2427)

It can be resolved by reducing the number of items or adding the --region. Either one. That at least gets back to "MalformedXML".