boto / boto3

AWS SDK for Python
https://aws.amazon.com/sdk-for-python/
Apache License 2.0
9.04k stars 1.87k forks source link

support memoryview in all places bytes, bytearray are supported #3423

Open alonbl opened 2 years ago

alonbl commented 2 years ago

Describe the bug

Currently there are explicit checks for explicit types in boto3, for example:

botocore/validate.py:

    def _validate_blob(self, param, shape, errors, name):
        if isinstance(param, (bytes, bytearray, str)):
            return
        elif hasattr(param, 'read'):
            # File like objects are also allowed for blob types.
            return
        else:
            errors.report(
                name,
                'invalid type',
                param=param,
                valid_types=[str(bytes), str(bytearray), 'file-like object'],
            )

In order to avoid copy memoryview class can be used to wrap bytearray, passing memoryview is compatible with bytearray, however, due to the validation check it fails.

Expected Behavior

Accept memoryview whenever bytes or bytearray are accepted.

Current Behavior

Due to manual checks which were added in good intentions passing memoryview is rejected, as result we cannot avoid copy of large buffers.

Reproduction Steps

>>> boto3.client("s3").upload_part(Bucket="xxxxx", Key="xxxxx", PartNumber=0, UploadId="", Body=memoryview(bytearray(10)))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3/dist-packages/botocore/client.py", line 391, in _api_call
    return self._make_api_call(operation_name, kwargs)
  File "/usr/lib/python3/dist-packages/botocore/client.py", line 691, in _make_api_call
    request_dict = self._convert_to_request_dict(
  File "/usr/lib/python3/dist-packages/botocore/client.py", line 739, in _convert_to_request_dict
    request_dict = self._serializer.serialize_to_request(
  File "/usr/lib/python3/dist-packages/botocore/validate.py", line 360, in serialize_to_request
    raise ParamValidationError(report=report.generate_report())
botocore.exceptions.ParamValidationError: Parameter validation failed:
Invalid type for parameter Body, value: <memory at 0x7f36586613c0>, type: <class 'memoryview'>, valid types: <class 'bytes'>, <class 'bytearray'>, file-like object

Possible Solution

Modify the following (for example) all over the code:

-if isinstance(param, (bytes, bytearray, str)):
-if isinstance(param, (bytes, bytearray, memoryview, str)):

Additional Information/Context

No response

SDK version used

botocore-1.27.75

Environment details (OS name and version, etc.)

python-3

tim-finnigan commented 2 years ago

Thanks @alonbl for the feature request. I brought this up for discussion with the team and it seemed like something they may be receptive to doing. However more research is required before confirming that decision. We encourage others to 👍 this issue if they are also interested and if there are any more details you can share regarding your use case please let us know.

alonbl commented 2 years ago

Thank you @tim-finnigan,

While this discussion happening, please also consider receiving into buffers like the new method of fh.recv_into() family of functions. Especially in s3 it is important to avoid memory copy for large blobs. But unlike the subject request which trivial this one requires an API change.

ljmc-github commented 1 year ago

A common use case is uploading a pandas df to AWS s3 without needing s3fs (for instance in corporate env with tight approvals), it's been asked several times on StackOverflow (a few examples 1 2 3).

It's not limited to pandas though, there are many packages whose upload to s3 could be transformed from

o = s3.Object("bucket", "key")
with BytesIO() as f:
    df.to_csv(f)
    o.put(f.getvalue())

or

o = s3.Object("bucket", "key")
with BytesIO() as f:
    df.to_csv(f)
    f.seek(0)
    o.upload_fileobj(f)

to

o = s3.Object("bucket", "key")
with BytesIO() as f:
    df.to_csv(f)
    o.put(f.getbuffer())

which would save a copy or a seek.

jakkdl commented 8 months ago

This would be very useful in https://github.com/piskvorky/smart_open/issues/380, and I opened a PR where I made the suggested changes + added tests https://github.com/boto/botocore/pull/3107. It really does seem to be as easy as changing a couple isinstance checks - all tests passed when I ran them locally.

nfcampos commented 1 month ago

Hi folks, any movement here? This would be very helpful for us when uploading files without copies

jakkdl commented 1 month ago

Hi folks, any movement here? This would be very helpful for us when uploading files without copies

there's been zero comments or reviews on my PR (which I noticed I'd incorrectly linked in my last comment, oops), doesn't seem like there's much interest from maintainers.