Closed lholthof closed 2 weeks ago
Thanks for your contribution :fire: We will take a look asap :rocket:
This usually happens when someone creates an empty folder / directory in S3 and then uploads some files into same folder / directory. In that case list-objects-v2
API returns two object, one folder with size 0
and another file with its actual size.
When S3DataSink#transferParts()
tries to upload an object with size 0
, it fails with above exception.
aws s3api put-object --bucket dsibucket-dev-consumer-001 --key testfolder1/
aws s3api put-object --bucket dsibucket-dev-consumer-001 --key testfolder1/10mb.txt --body ./10mb.txt
aws s3api list-objects-v2 --bucket dsibucket-dev-consumer-001 --prefix testfolder1/
Response:
{
"Contents": [
{
"Key": "testfolder1/",
"LastModified": "2024-07-25T06:14:42+00:00",
"ETag": "\"d41d8cd98f00b204e9800998ecf8427e\"",
"Size": 0,
"StorageClass": "STANDARD"
},
{
"Key": "testfolder1/10mb.txt",
"LastModified": "2024-07-25T06:16:34+00:00",
"ETag": "\"2d94c9f3cbfa5fbc410a7a8b72f8cee1\"",
"Size": 10485772,
"StorageClass": "STANDARD"
}
],
"RequestCharged": null
}
If file is directly uploaded without creating folder first, it doesn't return folder in the list of objects.
aws s3api put-object --bucket dsibucket-dev-consumer-001 --key testfolder2/10mb.txt --body ./10mb.txt
aws s3api list-objects-v2 --bucket dsibucket-dev-consumer-001 --prefix testfolder2/
Response:
{
"Contents": [
{
"Key": "testfolder2/10mb.txt",
"LastModified": "2024-07-25T06:17:37+00:00",
"ETag": "\"2d94c9f3cbfa5fbc410a7a8b72f8cee1\"",
"Size": 10485772,
"StorageClass": "STANDARD"
}
],
"RequestCharged": null
}
In S3DataSource#openPartStream() method, filter any S3 objects based on any one of the below criteria.
0
.Key
is same as prefix. i.e. skip the empty folder object which was created first before file being uploaded into it.
https://github.com/eclipse-edc/Technology-Aws/blob/562c56859089e6522c396acf6011d6197760768d/extensions/data-plane/data-plane-aws-s3/src/main/java/org/eclipse/edc/connector/dataplane/aws/s3/S3DataSource.java#L65-L76This issue is stale because it has been open for 14 days with no activity.
Even if the multipart upload worked for a 0
byte object, the key name for the folder part is wrongly resolved as testFolder/testFolder/ by the getDestinationObjectName()
function (you can see the key var value in the debug print).
As for the proposed solution by @hemantxpatel, i do agree filtering the empty folder object part is a good solution to avoid it being wrongly resolved by the getDestinationObjectName().
However i do not agree we should filter out objects of size 0
as that limits transfering empty files which can be a real transfer scenario.
For empty files, the AWS documentation lacks to explain if having an empty completedParts
list is valid to complete a multi part upload. It just says here a Part cannot be invalid, but says nothing about empty lists. Nevertheless, using multi part might not be the best approach in the empty file Part case. When bytesChunk
comes empty and completedParts
is also empty, the multi part should be aborted and a PutObject
should be used instead. This way we can guarantee empty files are also a valid transfer scenario.
Also, I quickly glanced the tests and it seems no case exists for an empty file transfer. Something to be improved.
@hemantxpatel Since you have the initial solution proposal, would you like to come forward and bring in a PR for this?
hey everyone :) can I be assigned to this issue, please?
hey everyone :) can I be assigned to this issue, please?
for some reason GH doesn't let me assign this to you. I used @rafaelmag110 as stand-in, so we know someone's working on it.
[edit] assigned you
Hi all, @rafaelmag110 asked me to work on it, so I had started working on it. I verified my code via doing an S3 to S3 transfer and it works well.
@bmg13 Let me know if you haven't already started and I can open the PR, otherwise it's a small fix.
Just need to convert while
loop to a do while
loop, so that a part is uploaded even it has size zero.
https://github.com/eclipse-edc/Technology-Aws/blob/e6e78a3cb1dbcecd29ad6b7a1ea93e6ac609f9b0/extensions/data-plane/data-plane-aws-s3/src/main/java/org/eclipse/edc/connector/dataplane/aws/s3/S3DataSink.java#L63-L73
Thanks @hemantxpatel I contacted you directly because in this case we wanted to move the fix a bit faster so we could have it in time for a downstream bugfix release. We indeed started working on this and should have the PR ready today.
Sorry for the confusion.
Bug Report
The folder copy feature for AmazonS3-PUSH scenario, controlled via objectPrefix property is throwing an exception within our dataplane (running tractus-x version 0.7.2).
Describe the Bug
The upload into the consumer bucket is failing for folder copies. While at the same time single files (specified by objectName) are working fine for the same bucket.
The scenario is setup as the following and can be reproduced.
Asset:
Transferprocess:
Expected Behavior
On consumer bucket I would expect the contents of the provider folder
testFolder
to be copied intotestFolder/testFolder
Observed Behavior
The upload does not happen at all. In the dataplane logs I can see the failure listed below.
Debugging the issue on provider side's dataplane within S3DataSink, I see that there is an individual
part
for the provider foldertestFolder
which hasbytesChunk.length = 0
.The
S3DataSink
code generates an uploadId and without transferring a single chunk it tries to complete the MultipartUpload. This is failing with the following error message:I guess this happens because completedParts does not contain a single entry which is considered as invalid request by AWS (The XML you provided was not well-formed or did not validate against our published schema (Service: S3, Status Code: 400...). As the folder part always comes as the initial part, the full copy process is aborted.