boto / boto3

AWS SDK for Python
https://aws.amazon.com/sdk-for-python/
Apache License 2.0
9.03k stars 1.87k forks source link

Use iterator for s3 object collection #1903

Open dazza-codes opened 5 years ago

dazza-codes commented 5 years ago
>>> objects
s3.Bucket.objectsCollection(s3.Bucket(name='my-project'), s3.ObjectSummary)

>>> print(next(objects))
TypeError: 's3.Bucket.objectsCollection' object is not an iterator

https://wiki.python.org/moin/Iterator

It does support iter(objects) wrapping, e.g.

>>> obj_iter = iter(objects)
>>> obj_iter
<generator object ResourceCollection.__iter__ at 0x7f57efbff660>

But why is this necessary?

stealthycoin commented 5 years ago

The objects are not meant to be consumed directly, instead a filter of some kind is intended to be put on the end:

https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html#S3.Bucket.objects

Examples:

>>> objects = s3.Bucket(name='test').objects.filter(Prefix='file0.zip')
>>> for o in objects:
>>>    print(o)
s3.ObjectSummary(bucket_name='test', key='file0.zip')

Or all of them:

>>> objects = s3.Bucket(name='test').objects.all()
>>> for o in objects:
>>>    print(o)
s3.ObjectSummary(bucket_name='test', key='file0.zip')
s3.ObjectSummary(bucket_name='test', key='file1.zip')
s3.ObjectSummary(bucket_name='test', key='file3.zip')
s3.ObjectSummary(bucket_name='test', key='file4.zip')
dazza-codes commented 5 years ago

Not sure if the latest release already supports this, but it seems like all() should return an iterable.

objects = s3.Bucket(name='test').objects.all()
next(objects) -> s3.ObjectSummary

The docs, e.g. https://boto3.amazonaws.com/v1/documentation/api/latest/guide/collections.html, explicitly indicate that A collection provides an iterable interface to a group of resources.

It seems like the distinction between iterable and iterator is important [1] and what the intention is for the collections. If the intention is to be an iterable and not an iterator, it's good to go and close this issue at will.

[1] https://www.geeksforgeeks.org/python-difference-iterable-iterator/

JordonPhillips commented 5 years ago

Yeah you should definitely be able to call next. Marking as a feature request.

y2k-shubham commented 5 years ago

Not sure if the latest release already supports this, but it seems like all() should return an iterable.

objects = s3.Bucket(name='test').objects.all()
next(objects) -> s3.ObjectSummary

The docs, e.g. https://boto3.amazonaws.com/v1/documentation/api/latest/guide/collections.html, explicitly indicate that A collection provides an iterable interface to a group of resources.

It seems like the distinction between iterable and iterator is important [1] and what the intention is for the collections. If the intention is to be an iterable and not an iterator, it's good to go and close this issue at will.

[1] https://www.geeksforgeeks.org/python-difference-iterable-iterator/

Until this is natively supported, i think you can do next(x for x in objects) reference: https://stackoverflow.com/a/2364277/3679900

Aeternitaas commented 5 years ago

Alternatively, you may also use something like:

...
bucket_iter = iter(objects)
next(bucket_iter)
...

This works since all Generators (which are used in boto3 to yield API results) are Iterators and all Iterators are Iterables.

MikeWhittakerRyff commented 4 years ago

map() should also be able to operate over all() - shouldn't it ?