Using biggus operators on 'large' varying-dtype arrays

DPeterK commented 9 years ago

With thanks to @matthew-mizielinski for originally pointing this out...

Using biggus operators to combine 'large' arrays with differing dtypes raises an error when biggus runs the specified operator on the chunks to be processed.

This is demonstrated by the following code snippet and the error(s) produced when I ran it:

a_type = np.float32
b_type = np.float64
shape = (41, 192, 144)

a = np.array(np.random.random(shape), dtype=a_type)
b = np.array(np.random.random(shape), dtype=b_type)

a_bg = biggus.NumpyArrayAdapter(a)
b_bg = biggus.NumpyArrayAdapter(b)
prod = a_bg * b_bg

result = biggus.sum(prod, axis=0).ndarray()

Exception in thread <biggus.StreamsHandlerNode object at 0x7f64d325ac50>:
Traceback (most recent call last):
  File "/.../lib/python2.7/threading.py", line 810, in __bootstrap_inner
    self.run()
  File "/.../lib/python2.7/threading.py", line 763, in run
    self.__target(*self.__args, **self.__kwargs)
  File "/.../biggus/biggus/__init__.py", line 293, in run
    self.output(self.process_chunks(input_chunks))
  File "/.../biggus/biggus/__init__.py", line 323, in process_chunks
    return self.streams_handler.process_chunks(chunks)
  File "/.../biggus/biggus/__init__.py", line 2808, in process_chunks
    array = self.operator(*[chunk.data for chunk in chunks])
ValueError: operands could not be broadcast together with shapes (41,192,144) (37,192,144) 

Traceback (most recent call last):
  File "<input>", line 11, in <module>
  File "/.../biggus/biggus/__init__.py", line 2547, in ndarray
    result, = engine.ndarrays(self)
  File "/.../biggus/biggus/__init__.py", line 469, in ndarrays
    return self._evaluate(arrays, False)
  File "/.../biggus/biggus/__init__.py", line 460, in _evaluate
    ndarrays = group.evaluate(masked)
  File "/.../biggus/biggus/__init__.py", line 446, in evaluate
    raise Exception('error during evaluation')
Exception: error during evaluation

If you make the shape of the arrays smaller or unify the dtypes of the two arrays then this error does not occur.

I've taken a little look into what might be causing this but have tied my brain in knots trying to follow the flow of execution that biggus follows to get to this point. So, instead of sitting on this for ages and trying to find a solution I figured it would be beneficial to raise this as an issue and also keep working on it.

pelson commented 9 years ago

Dupe of #163?

rhattersley commented 9 years ago

have tied my brain in knots trying to follow the flow of execution that biggus follows to get to this point

The underlying problem is that during evaluation biggus chunks the source arrays and limits the to fixed number of bytes. So when two large sources have dtypes with different item sizes (i.e. number of bytes per item) the resulting chunks will have different lengths - e.g. for float32 each chunk will have at most MAX_CHUNK_SIZE / 4 elements, but for float64 each chunk will have at most MAX_CHUNK_SIZE / 8 elements.

The simplest workaround might be to switch to chunking with a fixed number of items (instead of bytes).

DPeterK commented 9 years ago

Dupe of #163?

Or possibly it has the same route cause but is showing itself in a different way?

SciTools / biggus

Using biggus operators on 'large' varying-dtype arrays #164