Closed necouchman closed 2 years ago
Any takers? Any more info I can provide?
Hi @necouchman I'm sorry about the delayed response.
The response of S3 listObjectVersions is sent in XML format. Somewhere in the response there's a character the SDK is having trouble to parse, probably a control character. If the character is really a non-valid XML character there's not much the SDK can do.
Some questions to try to narrow down the cause:
Thanks @debora-ito. I will take a look at that document and try to enable the additional logging and see what I can provide.
Regarding the inputs, the objects get into the buckets in one of two ways:
I don't know if you need any more detail than that, but it's pretty standard stuff available from AWS - I'm not using any custom code or anything like that. If URL encoding needs to be done prior to writing the objects, that would be something that would need to be done within either Storage Gateway or Datasync, and I don't have any insight into how either of those platforms work. I would assume they would have to be URL encoding it, already, as I would think there would be errors trying to create objects in the S3 bucket without having done that?
If the character is really a non-valid XML character there's not much the SDK can do.
I guess the thing that would be nice in this case is to configure how the SDK handles the errors. Right now, when an error is encountered, it completely bails out and halts execution of the program. I looked into some various ways in Java to try to prevent this, but I didn't find a way that would actually work. It would be nice if there were a setting within the SDK that you could use to tell it, "if you encounter an error, ignore that object and move to the next one." Obviously there are going to be errors that can't be completely ignored (login failure to S3, for example), but seems like certain errors or certain classes of errors could configured to be handled by logging and ignoring. If I have a bucket with 20 million objects and 5 of them are causing this error, it's frustrating to not be able to process the other 19,999,995 objects :-).
Handling the error in a catch
is not an option?
@debora-ito I can do a try...catch
, but there are two issues:
1) It still seems to be a fatal error for the application - that is, when the Exception occurs, the entire Java application bails out.
2) I'm essentially abstracting away all of the retrieval to the SDK's paginator:
ListObjectVersionsIterable objVersions = s3.listObjectVersionsPaginator(listObjectVersions);
The error occurs within that one line right there, so, even if execution of the overall application that I've written could continue, it would fail to retrieve anything, because the SDK itself does not allow execution to continue further when there is an error - it throws an exception that has to be handled.
Still working on getting the additional debugging information for you.
@necouchman picking this up again after a long time. Any luck in getting the verbose wire logs?
Well, I followed the instructions, but I'm not sure it's giving any better information. These are the last few lines:
2022-03-04 13:11:51 [main] DEBUG software.amazon.awssdk.request:84 - Received successful response: 200
2022-03-04 13:11:51 [main] DEBUG software.amazon.awssdk.request:84 - Sending Request: DefaultSdkHttpFullRequest(httpMethod=GET, protocol=https, host=bucket.s3.eu-west-2.amazonaws.com, port=443, encodedPath=, headers=[amz-sdk-invocation-id, User-Agent], queryParameters=[versions, key-marker, prefix, version-id-marker])
2022-03-04 13:11:51 [main] DEBUG software.amazon.awssdk.request:84 - Received successful response: 200
2022-03-04 13:11:51 [main] DEBUG software.amazon.awssdk.request:84 - Sending Request: DefaultSdkHttpFullRequest(httpMethod=GET, protocol=https, host=bucket.s3.eu-west-2.amazonaws.com, port=443, encodedPath=, headers=[amz-sdk-invocation-id, User-Agent], queryParameters=[versions, key-marker, prefix, version-id-marker])
2022-03-04 13:11:51 [main] DEBUG software.amazon.awssdk.request:84 - Received successful response: 200
2022-03-04 13:11:51 [main] DEBUG software.amazon.awssdk.request:84 - Sending Request: DefaultSdkHttpFullRequest(httpMethod=GET, protocol=https, host=bucket.s3.eu-west-2.amazonaws.com, port=443, encodedPath=, headers=[amz-sdk-invocation-id, User-Agent], queryParameters=[versions, key-marker, prefix, version-id-marker])
Could not parse XML response.
Is there any additional detail that I need to put in the configuration file to get more details?
Those are regular DEBUG logs. Verbose wire logs would include the data sent in the request body - https://docs.aws.amazon.com/sdk-for-java/latest/developer-guide/logging-slf4j.html#sdk-java-logging-verbose
This is my log4j2.xml file:
<Configuration status="WARN">
<Appenders>
<Console name="ConsoleAppender" target="SYSTEM_OUT">
<PatternLayout pattern="%d{YYYY-MM-dd HH:mm:ss} [%t] %-5p %c:%L - %m%n" />
</Console>
</Appenders>
<Loggers>
<Root level="WARN">
<AppenderRef ref="ConsoleAppender" />
</Root>
<Logger name="software.amazon.awssdk" level="WARN" />
<Logger name="software.amazon.awssdk.request" level="DEBUG" />
<Logger name="org.apache.http.wire" level="DEBUG" />
</Loggers>
</Configuration>
I believe I've correctly enabled wire logging, but I'm not seeing any output.
Hum, for some reason calling listObjectVersionsPaginator
does not generate wirelogs.
Can you call listObjectVersions
instead?
Sure, I'll give it a go, just have to do the pagination manually.
That didn't seem to return anything additional. Code looks like this:
ListObjectVersionsRequest listObjectVersions = ListObjectVersionsRequest
.builder()
.bucket(bucket)
.prefix(prefix)
.build();
RetainedObjectList objectList = new RetainedObjectList();
while (true) {
ListObjectVersionsResponse objVersions = s3.listObjectVersions(listObjectVersions);
...
if (objVersions.isTruncated()) {
listObjectVersions = ListObjectVersionsRequest
.builder()
.bucket(bucket)
.prefix(prefix)
.keyMarker(objVersions.nextKeyMarker())
.build();
objVersions = s3.listObjectVersions(listObjectVersions);
}
else {
break;
}
}
and I get the following output:
2022-03-10 20:12:30 [main] DEBUG software.amazon.awssdk.request:84 - Received successful response: 200
2022-03-10 20:12:30 [main] DEBUG software.amazon.awssdk.request:84 - Sending Request: DefaultSdkHttpFullRequest(httpMethod=GET, protocol=https, host=bucket.s3.eu-west-2.amazonaws.com, port=443, encodedPath=, headers=[amz-sdk-invocation-id, User-Agent], queryParameters=[versions, key-marker, prefix])
2022-03-10 20:12:30 [main] DEBUG software.amazon.awssdk.request:84 - Received successful response: 200
2022-03-10 20:12:30 [main] DEBUG software.amazon.awssdk.request:84 - Sending Request: DefaultSdkHttpFullRequest(httpMethod=GET, protocol=https, host=bucket.s3.eu-west-2.amazonaws.com, port=443, encodedPath=, headers=[amz-sdk-invocation-id, User-Agent], queryParameters=[versions, key-marker, prefix])
2022-03-10 20:12:31 [main] DEBUG software.amazon.awssdk.request:84 - Received successful response: 200
2022-03-10 20:12:31 [main] DEBUG software.amazon.awssdk.request:84 - Sending Request: DefaultSdkHttpFullRequest(httpMethod=GET, protocol=https, host=bucket.s3.eu-west-2.amazonaws.com, port=443, encodedPath=, headers=[amz-sdk-invocation-id, User-Agent], queryParameters=[versions, key-marker, prefix])
2022-03-10 20:12:32 [main] DEBUG software.amazon.awssdk.request:84 - Received successful response: 200
2022-03-10 20:12:32 [main] DEBUG software.amazon.awssdk.request:84 - Sending Request: DefaultSdkHttpFullRequest(httpMethod=GET, protocol=https, host=bucket.s3.eu-west-2.amazonaws.com, port=443, encodedPath=, headers=[amz-sdk-invocation-id, User-Agent], queryParameters=[versions, key-marker, prefix])
Could not parse XML response.
@necouchman For some reason your environment is not picking up the org.apache.http.wire
DEBUG config.
Here's the wirelogs of a listObjectVersions
call I made locally:
2022-03-16 11:56:59,804 [main] DEBUG org.apache.http.wire - http-outgoing-0 >> "GET /?versions HTTP/1.1[\r][\n]"
2022-03-16 11:56:59,804 [main] DEBUG org.apache.http.wire - http-outgoing-0 >> "Host: bucket-test.s3.ap-south-1.amazonaws.com[\r][\n]"
2022-03-16 11:56:59,804 [main] DEBUG org.apache.http.wire - http-outgoing-0 >> "amz-sdk-invocation-id: 8a3ce795-bd03-eb9a-d658-0a8528cbfb68[\r][\n]"
2022-03-16 11:56:59,804 [main] DEBUG org.apache.http.wire - http-outgoing-0 >> "amz-sdk-request: attempt=1; max=4[\r][\n]"
2022-03-16 11:56:59,804 [main] DEBUG org.apache.http.wire - http-outgoing-0 >> "Authorization: AWS4-HMAC-SHA256 Credential=xxx/20220316/ap-south-1/s3/aws4_request, SignedHeaders=amz-sdk-invocation-id;amz-sdk-request;host;x-amz-content-sha256;x-amz-date;x-amz-security-token, Signature=xxx[\r][\n]"
2022-03-16 11:56:59,804 [main] DEBUG org.apache.http.wire - http-outgoing-0 >> "User-Agent: aws-sdk-java/2.17.145 Mac_OS_X/12.2.1 OpenJDK_64-Bit_Server_VM/11.0.14+9-LTS ...[\r][\n]"
2022-03-16 11:56:59,805 [main] DEBUG org.apache.http.wire - http-outgoing-0 >> "x-amz-content-sha256: UNSIGNED-PAYLOAD[\r][\n]"
2022-03-16 11:56:59,805 [main] DEBUG org.apache.http.wire - http-outgoing-0 >> "X-Amz-Date: 20220316T185658Z[\r][\n]"
2022-03-16 11:56:59,805 [main] DEBUG org.apache.http.wire - http-outgoing-0 >> "Connection: Keep-Alive[\r][\n]"
2022-03-16 11:56:59,805 [main] DEBUG org.apache.http.wire - http-outgoing-0 >> "[\r][\n]"
2022-03-16 11:57:00,090 [main] DEBUG org.apache.http.wire - http-outgoing-0 << "HTTP/1.1 200 OK[\r][\n]"
2022-03-16 11:57:00,090 [main] DEBUG org.apache.http.wire - http-outgoing-0 << "x-amz-id-2: xxx[\r][\n]"
2022-03-16 11:57:00,090 [main] DEBUG org.apache.http.wire - http-outgoing-0 << "x-amz-request-id: Q2KP7EJ1NWZ3M9F3[\r][\n]"
2022-03-16 11:57:00,090 [main] DEBUG org.apache.http.wire - http-outgoing-0 << "Date: Wed, 16 Mar 2022 18:57:01 GMT[\r][\n]"
2022-03-16 11:57:00,090 [main] DEBUG org.apache.http.wire - http-outgoing-0 << "Content-Type: application/xml[\r][\n]"
2022-03-16 11:57:00,090 [main] DEBUG org.apache.http.wire - http-outgoing-0 << "Transfer-Encoding: chunked[\r][\n]"
2022-03-16 11:57:00,090 [main] DEBUG org.apache.http.wire - http-outgoing-0 << "Server: AmazonS3[\r][\n]"
2022-03-16 11:57:00,090 [main] DEBUG org.apache.http.wire - http-outgoing-0 << "[\r][\n]"
2022-03-16 11:57:00,125 [main] DEBUG org.apache.http.wire - http-outgoing-0 << "28d[\r][\n]"
2022-03-16 11:57:00,125 [main] DEBUG org.apache.http.wire - http-outgoing-0 << "<?xml version="1.0" encoding="UTF-8"?>[\n]"
2022-03-16 11:57:00,125 [main] DEBUG org.apache.http.wire - http-outgoing-0 << "<ListVersionsResult xmlns="http://s3.amazonaws.com/doc/2006-03-01/"><Name>bucket-test</Name><Prefix></Prefix><KeyMarker></KeyMarker><VersionIdMarker></VersionIdMarker><MaxKeys>1000</MaxKeys><IsTruncated>false</IsTruncated><Version><Key>test.txt</Key><VersionId>null</VersionId><IsLatest>true</IsLatest><LastModified>2020-10-21T00:18:10.000Z</LastModified><ETag>"9baf628de796b6857d7768a34d72162d"</ETag><Size>19</Size><Owner><ID>xxx</ID></Owner><StorageClass>STANDARD</StorageClass></Version></ListVersionsResult>[\r][\n]"
2022-03-16 11:57:00,148 [main] DEBUG org.apache.http.wire - http-outgoing-0 << "0[\r][\n]"
2022-03-16 11:57:00,148 [main] DEBUG org.apache.http.wire - http-outgoing-0 << "[\r][\n]"
The XML sent in the response show up in the wirelogs, so my hope is to see problematic XML in your response and see if we can identify why the SDK is having trouble parsing it.
@debora-ito Any idea what I'm doing wrong that it isn't picking this up? I posted my log4j2.xml file above, and I've also included log4j-core and log4j-1.2.-api in my pom.xml file and rebuilt my JAR. I've also tried adding various options to my Java command - for example "-Dlog4j.loggers.org.apache.http.wire=DEBUG" and nothing seems to produce the full wire log??
My command line looks like this: java -Dlog4j2.configurationFile=./log4j2.xml -jar target/S3Retention.jar
No idea, sorry :/ Your log4j2.xml is exactly the same as mine, the only difference is I'm running the application on an IDE.
I'm using NetBeans to write/compile it, but then I run it from the command line. I'll keep trying things and see what I can come up with.
@necouchman any luck with the wirelogs?
Have you tried setting up the encodingType in the ListObjectVersionsRequest?
ListObjectVersionsRequest listObjectVersions = ListObjectVersionsRequest
.builder()
.bucket(bucket)
.prefix(prefix)
.encodingType(EncodingType.URL)
.build();
Thanks @debora-ito - I'm still confirming right now, but adding the encodingType(EncodingType.URL)
call may have resolved it.
Describe the issue
I have several S3 buckets where data is written to the buckets by both Storage Gateway and Datasync. A handful of these buckets have objects in them written by Mac OSX clients, which are a little more liberal in their support of non-standard characters in filenames than Windows. I've written a program that goes through and cleans up old versions of objects in these buckets, and, while it works on most of the buckets, occasionally I run into an issue where listing the object versions causes an exceptIon:
The block of code that causes this is as follows:
I've also tried non-paginated requests, and the failure is the same. This seems to be happening within the AWS Java SDK, and before I do anything else with the results of the object version query. My questions are:
Steps to Reproduce
Current behavior
AWS Java SDK version used
2.17.42
JDK version used
openjdk 11.0.12 2021-07-20 LTS OpenJDK Runtime Environment 18.9 (build 11.0.12+7-LTS) OpenJDK 64-Bit Server VM 18.9 (build 11.0.12+7-LTS, mixed mode, sharing)
Operating System and version
CentOS Linux 8