aws / aws-sdk-php

Official repository of the AWS SDK for PHP (@awsforphp)
http://aws.amazon.com/sdkforphp
Apache License 2.0
6.01k stars 1.22k forks source link

S3Client: `upload()` returns differently encoded `ObjectURL`, depending on the uploaded file size (e.g. `%2F` instead of `/`) #2933

Closed mfn closed 3 months ago

mfn commented 4 months ago

Describe the bug

There's a pretty interesting surprise behaviour in getting back the canonical URL of the just-uploaded data.

If the file size exceeds \Aws\S3\ObjectUploader::DEFAULT_MULTIPART_THRESHOLD, then a different upload strategy is used, which ends up returning a differently encoded ObjectURL: for such files, e.g. / is encoded as %2F, possibly breaking certain assumption on what is supposed to be returned here.

Expected Behavior

The returned ObjectURL should be consistent, i.e. uploading the 17mb file should output:

$ php aws_s3_get_object_url_test.php 17mb.file
https://….s3.….amazonaws.com/test/path.file

Current Behavior

The ObjectURL is encoded differently, based on the file size.

Reproduction Steps

Test script:

<?php declare(strict_types=1);

require_once 'vendor/autoload.php';

use Aws\S3\S3Client;

$s3Client = new S3Client([
    'region' => '…',
    'credentials' => [
        'key' => '…',
        'secret' => '…',
    ]
]);

$result = $s3Client->upload(
    '…',
    'test/path.file',
    file_get_contents($argv[1])
);

echo $result->get('ObjectURL'), "\n";

Test files created:

$ dd if=/dev/zero of=15mb.file bs=1024 count=15000
15000+0 records in
15000+0 records out
15360000 bytes (15 MB, 15 MiB) copied, 1.1505 s, 13.4 MB/s

$ dd if=/dev/zero of=17mb.file bs=1024 count=17000
17000+0 records in
17000+0 records out
17408000 bytes (17 MB, 17 MiB) copied, 1.30513 s, 13.3 MB/s

$ ls -lh *.file
-rw-r--r-- 1 user user 15M Jun  3 13:17 15mb.file
-rw-r--r-- 1 user user 17M Jun  3 13:17 17mb.file

Uploading the 15mb file:

$ php aws_s3_get_object_url_test.php 15mb.file
https://….s3.….amazonaws.com/test/path.file

Uploading the 17mb file (notice the %2F in the path‼️):

$ php aws_s3_get_object_url_test.php 17mb.file
https://….s3.….amazonaws.com/test%2Fpath.file

Possible Solution

The returned value should be consistently encoded.

In \Aws\S3\PutObjectUrlMiddleware::__invoke, the ObjectURL is "constructed":

        return $next($command, $request)->then(
            function (ResultInterface $result) use ($command) {
                $name = $command->getName();
                switch ($name) {
                    case 'PutObject':
                    case 'CopyObject':
                        $result['ObjectURL'] = isset($result['@metadata']['effectiveUri'])
                            ? $result['@metadata']['effectiveUri']
                            : null;
                        break;
                    case 'CompleteMultipartUpload':
                        $result['ObjectURL'] = $result['Location'];
                        break;
                }
                return $result;
            }
        );

As can be seen, in case of CompleteMultipartUpload, which is used for data > 16MB, the resulting Location is used, which AFAICS, is from the header response directly.

For regular uploads, ObjectURL is based on $result['@metadata']['effectiveUri'], which itself is created in \Aws\WrappedHttpHandler::parseResponse as

        $metadata = [
            'statusCode'    => $status,
            'effectiveUri'  => (string) $request->getUri(),
            'headers'       => [],
            'transferStats' => [],
        ];

I.e. it's constructed from the internal $request object, instead of the returned server response.

Additional Information/Context

No response

SDK version used

3.308.7

Environment details (Version of PHP (php -v)? OS name and version, etc.)

PHP 8.3.7, Linux

mfn commented 4 months ago

Maybe not a "software bug", but sure an inconsistency well hidden. Maybe this is documented (or: should be)?

yenfryherrerafeliz commented 3 months ago

Hi @mfn, we have merged the following PR that url decodes the Location when the upload is done through a multipart request.

Please feel free of opening a new issues for anything else that we may help with.

Thanks!

github-actions[bot] commented 3 months ago

This issue is now closed. Comments on closed issues are hard for our team to see. If you need more assistance, please open a new issue that references this one.