Open alonbl opened 2 years ago
Thanks @alonbl for the feature request. I brought this up for discussion with the team and it seemed like something they may be receptive to doing. However more research is required before confirming that decision. We encourage others to 👍 this issue if they are also interested and if there are any more details you can share regarding your use case please let us know.
Thank you @tim-finnigan,
While this discussion happening, please also consider receiving into buffers like the new method of fh.recv_into()
family of functions. Especially in s3
it is important to avoid memory copy for large blobs. But unlike the subject request which trivial this one requires an API change.
A common use case is uploading a pandas df to AWS s3 without needing s3fs (for instance in corporate env with tight approvals), it's been asked several times on StackOverflow (a few examples 1 2 3).
It's not limited to pandas though, there are many packages whose upload to s3 could be transformed from
o = s3.Object("bucket", "key")
with BytesIO() as f:
df.to_csv(f)
o.put(f.getvalue())
or
o = s3.Object("bucket", "key")
with BytesIO() as f:
df.to_csv(f)
f.seek(0)
o.upload_fileobj(f)
to
o = s3.Object("bucket", "key")
with BytesIO() as f:
df.to_csv(f)
o.put(f.getbuffer())
which would save a copy
or a seek
.
This would be very useful in https://github.com/piskvorky/smart_open/issues/380, and I opened a PR where I made the suggested changes + added tests https://github.com/boto/botocore/pull/3107. It really does seem to be as easy as changing a couple isinstance
checks - all tests passed when I ran them locally.
Hi folks, any movement here? This would be very helpful for us when uploading files without copies
Hi folks, any movement here? This would be very helpful for us when uploading files without copies
there's been zero comments or reviews on my PR (which I noticed I'd incorrectly linked in my last comment, oops), doesn't seem like there's much interest from maintainers.
Describe the bug
Currently there are explicit checks for explicit types in boto3, for example:
botocore/validate.py
:In order to avoid copy
memoryview
class can be used to wrapbytearray
, passingmemoryview
is compatible withbytearray
, however, due to the validation check it fails.Expected Behavior
Accept
memoryview
wheneverbytes
orbytearray
are accepted.Current Behavior
Due to manual checks which were added in good intentions passing
memoryview
is rejected, as result we cannot avoid copy of large buffers.Reproduction Steps
Possible Solution
Modify the following (for example) all over the code:
Additional Information/Context
No response
SDK version used
botocore-1.27.75
Environment details (OS name and version, etc.)
python-3