Blosc / bcolz

A columnar data container that can be compressed.
http://bcolz.blosc.org
959 stars 150 forks source link

carray is an Iterator #74

Open mrocklin opened 10 years ago

mrocklin commented 10 years ago

This is odd behavior

In [1]: import bcolz
In [2]: import collections

In [3]: issubclass(bcolz.carray, collections.Iterator)
Out[3]: True

This is because (I think) bcolz.carray implements a __next__ method which is (I think) inappropriate for a data container (it being more appropriate for a stateful iterator).

Is this intended? Should it be fixed?

Related to #37

esc commented 10 years ago

I agree that it seems convoluted. Perhaps this is also a great place for separation of concerns by having a CarrayIterator.

If you look at the following blog post:

https://stackoverflow.com/questions/19151/build-a-basic-python-iterator

It seems like the right thing to do may be to have the carray implement __iter__ and the CarrayIterator to implement next() and __next__.?

mrocklin commented 10 years ago

That follows my intuition.

esc commented 10 years ago

Here is another 'blueprint' I found:

http://www.shutupandship.com/2012/01/understanding-python-iterables-and.html

esc commented 10 years ago

So, after pondering on this some more, I think the reason why the iterator is 'fused' into the carray is so that it can use the io buffer of the carray. Some hints:

https://github.com/Blosc/bcolz/blob/master/bcolz/carray_ext.pyx#L2272

https://github.com/Blosc/bcolz/blob/master/bcolz/carray_ext.pyx#L2431

Also, it is worth noting, that the Iterator isn't actually the original carray but in fact a view() of it.