biggus should implement the ndarray interface

shoyer commented 10 years ago

Implementation would be a one-liner. Just add the following method to biggus.Array:

    def __array__(self, dtype=None):
        """NumPy array interface"""
        return self.ndarray()

Unfortunately, it's not as easy to support casting to masked arrays, although it looks like that is possible by adding a _mask property.

The reason I have not submitted this as a pull request (yet) is that there is a trade-off here:

Positives: This would make it dead easy to use biggus arrays in external packages (e.g., Iris or xray) without needing to do any un-Pythonic isinstance or hasattr type inspection to figure out how to make a concrete array. NumPy functions like np.asarray or np.sin would just work, making biggus arrays immediately compatible with tons of other code. By default, all arrays would be evaluated, but numpy also has hooks for ufuncs that would let you handle them in a lazy fashion.
Negatives: Writing np.array([[small_array] * 1000] * 5) will evaluate the array, unless you use the keyword argument dtype=object, which also would need to be sprinkled throughout biggus' code in a few places. Creating such arrays of arrays could be more awkward.

In my opinion, this is not much of a downside. Most users won't be making ArrayStacks from scratch, and people writing library code can use native Python data structures (which could be automatically cast to object ndarrays) or learn to write dtype=object. Thoughts?

rhattersley commented 10 years ago

Automatic conversion to NumPy arrays was deliberately excluded. In general terms this was because of a desire to avoid unnecessary API "magic" which makes code behaviour hard to predict. More specifically, conversion to np.ndarray is a dangerous operation. Making it explicit encourages appropriate planning of algorithms and when/how to do the conversion.

shoyer commented 10 years ago

I would argue that np.asarray(big_array) is just as explicit as as big_array.ndarray(). The difference is that it uses a standard interface.

You can actually turn off most of the numpy API "magic" while still allowing explicit conversion by setting __array_priority__ and __array_prepare__. For example:

class ArrayLike(object):
    # disable math with ndarrays
    __array_priority__ = 100

    # disable numpy's ufuncs
    def __array_prepare__(self, array, context=None):
        raise NotImplementedError('ufuncs not implemented')

    def __array__(self):
        return np.array(-99999)

In my opinion, we don't need a new interface for making ndarrays, and biggus should play nicely with it. But feel free to close this issue (it's your project after all).

pelson commented 9 years ago

__array__ has been implemented in #129.

rhattersley commented 9 years ago

I would argue that np.asarray(big_array) is just as explicit as as big_array.ndarray(). The difference is that it uses a standard interface.

@shoyer - thank you for the original nudge ... I got there in the end. :wink:

shoyer commented 9 years ago

@rhattersley Actually, thank you! It looks like biggus has been getting some nice improvements lately...

SciTools / biggus

biggus should implement the ndarray interface #66