SciTools / biggus

:no_entry: [DEPRECATED] Virtual large arrays and lazy evaluation.
http://biggus.readthedocs.io/
GNU Lesser General Public License v3.0
54 stars 27 forks source link

NumPy 1.10+1.11 fixes #151

Closed QuLogic closed 8 years ago

QuLogic commented 9 years ago

Some tests do not pass with NumPy 1.10. Most of these are due to invalid dtype strings, which are fixed here. There are three more failures, but they are a bit too deep into the internals for me to understand:

======================================================================
ERROR: test_simple (biggus.tests.unit.test_BroadcastArray.Test_masked_array)
----------------------------------------------------------------------
Traceback (most recent call last):
  File ".../biggus/tests/unit/test_BroadcastArray.py", line 125, in test_simple
    expected, _ = np.broadcast_arrays(orig.data, result)
  File ".../numpy/lib/stride_tricks.py", line 200, in broadcast_arrays
    for array in args]
  File ".../numpy/lib/stride_tricks.py", line 70, in _broadcast_to
    result.flags.writeable = True
ValueError: cannot set WRITEABLE flag to True of this array
======================================================================
ERROR: test_multi_int (biggus.tests.unit.test_std_var.TestNumpyArrayAdapter)
----------------------------------------------------------------------
Traceback (most recent call last):
  File ".../biggus/tests/unit/test_std_var.py", line 102, in test_multi_int
    self._check(self.data, shape=(3, 4), ddof=ddof)
  File ".../biggus/tests/unit/test_std_var.py", line 90, in _check
    result = bfunc(array, axis=0, ddof=ddof).ndarray()
  File ".../biggus/__init__.py", line 2471, in ndarray
    result, = engine.ndarrays(self)
  File ".../biggus/__init__.py", line 466, in ndarrays
    return self._evaluate(arrays, False)
  File ".../biggus/__init__.py", line 457, in _evaluate
    ndarrays = group.evaluate(masked)
  File ".../biggus/__init__.py", line 443, in evaluate
    raise Exception('error during evaluation')
Exception: error during evaluation
======================================================================
ERROR: test_all_masked (biggus.tests.unit.test_sum.TestNumpyArrayAdapterMasked)
----------------------------------------------------------------------
Traceback (most recent call last):
  File ".../biggus/tests/unit/_aggregation_test_framework.py", line 142, in test_all_masked
    self._check(data)
  File ".../biggus/tests/unit/_aggregation_test_framework.py", line 109, in _check
    result = self.biggus_operator(array, axis=0).masked_array()
  File ".../biggus/__init__.py", line 2475, in masked_array
    result, = engine.masked_arrays(self)
  File ".../biggus/__init__.py", line 463, in masked_arrays
    return self._evaluate(arrays, True)
  File ".../biggus/__init__.py", line 457, in _evaluate
    ndarrays = group.evaluate(masked)
  File ".../biggus/__init__.py", line 443, in evaluate
    raise Exception('error during evaluation')
Exception: error during evaluation

Any idea for fixing up these last three would be appreciated.

pelson commented 9 years ago

The test_simple (biggus.tests.unit.test_BroadcastArray.Test_masked_array) test looks like it could be a bug with stride tools and masked arrays, but I haven't had a chance to look if I can reproduce it without biggus. The other two have poor messages, are there any warnings coming out at the same time?

QuLogic commented 9 years ago

Ah, good point about the warnings; this might clear things up a bit:

test_getitem (biggus.tests.test_adapter.TestNumpyAdapter) ... 
.../biggus/__init__.py:1450: VisibleDeprecationWarning: boolean index did not match indexed array along dimension 0; dimension is 3 but corresponding boolean dimension is 2
  array = self.concrete.__getitem__(keys)
.../biggus/__init__.py:1450: VisibleDeprecationWarning: boolean index did not match indexed array along dimension 1; dimension is 4 but corresponding boolean dimension is 3
  array = self.concrete.__getitem__(keys)
ok

test_getitem (biggus.tests.test_adapter.TestOrthoAdapter) ... 
.../biggus/tests/test_adapter.py:201: VisibleDeprecationWarning: boolean index did not match indexed array along dimension 0; dimension is 3 but corresponding boolean dimension is 2
  result = result.__getitem__(tuple(index))
.../biggus/tests/test_adapter.py:201: VisibleDeprecationWarning: boolean index did not match indexed array along dimension 1; dimension is 4 but corresponding boolean dimension is 3
  result = result.__getitem__(tuple(index))
ok

test_flat (biggus.tests.unit.test_std_var.TestNumpyArrayAdapterMasked) ... Exception in thread <biggus.StreamsHandlerNode object at 0x7fc52d755ad0>:
Traceback (most recent call last):
  File "/usr/lib64/python2.7/threading.py", line 813, in __bootstrap_inner
    self.run()
  File "/usr/lib64/python2.7/threading.py", line 766, in run
    self.__target(*self.__args, **self.__kwargs)
  File ".../biggus/__init__.py", line 290, in run
    self.output(self.process_chunks(input_chunks))
  File ".../biggus/__init__.py", line 320, in process_chunks
    return self.streams_handler.process_chunks(chunks)
  File ".../biggus/__init__.py", line 2110, in process_chunks
    self.process_data(chunk.data)
  File ".../biggus/__init__.py", line 2337, in process_data
    temp /= self.k
TypeError: ufunc 'true_divide' output (typecode 'd') could not be coerced to provided output parameter (typecode 'l') according to the casting rule ''same_kind''
ok

test_flat (biggus.tests.unit.test_sum.TestNumpyArrayAdapterMasked) ... Exception in thread <biggus.StreamsHandlerNode object at 0x7fc52d74d690>:
Traceback (most recent call last):
  File "/usr/lib64/python2.7/threading.py", line 813, in __bootstrap_inner
    self.run()
  File "/usr/lib64/python2.7/threading.py", line 766, in run
    self.__target(*self.__args, **self.__kwargs)
  File ".../biggus/__init__.py", line 290, in run
    self.output(self.process_chunks(input_chunks))
  File ".../biggus/__init__.py", line 320, in process_chunks
    return self.streams_handler.process_chunks(chunks)
  File ".../biggus/__init__.py", line 2110, in process_chunks
    self.process_data(chunk.data)
  File ".../biggus/__init__.py", line 2239, in process_data
    self.running_total += np.sum(data, axis=self.axis)
  File ".../numpy/ma/core.py", line 3966, in __iadd__
    getdata(other)))
TypeError: Cannot cast ufunc add output from dtype('float64') to dtype('int64') with casting rule 'same_kind'
ok

Looks like some size and type mismatches.

QuLogic commented 9 years ago

So I'm guessing the first 4 warnings are caused by these tests:

            # Boolean indexing (too few indices - zero pad)
            [(3, 4), [np.array([1, 1], dtype=bool)], (2, 4)],
            [(3, 4), [(slice(None), np.array([1, 1, 1], dtype=bool))], (3, 3)],

but I believe these are soon to be invalid in NumPy. Should they be removed and/or deprecated, or the biggus type changed to expand the index to fit?

The other two are of course due to the in-place operations. But I assume those are there for performance reasons. Would it make sense to always start with a float64 array if possible?

QuLogic commented 8 years ago

test_simple has been fixed at some point before 0.13.0.

pelson commented 8 years ago

Reminder (to myself as much as anyone) that a pin was put in the numpy version: https://github.com/SciTools/biggus/pull/173/files#diff-b4ef698db8ca845e5845c4618278f29aR1

QuLogic commented 8 years ago

Ping? I think NumPy 1.10 is common enough now, especially since 1.11 was already released.

DPeterK commented 8 years ago

And the tests now all pass :sunglasses:

EDIT: of course, the tests are all passing because of that pin in the NumPy version...

QuLogic commented 8 years ago

Yea, this is a ping for @pelson or others, because I don't understand the internals of biggus enough to fix the remaining problems.

DPeterK commented 8 years ago

However I also ran the tests offline against NumPy v1.10 (with the NumPy version pin removed) and the tests passed then as well. It is possible that the issues previously causing the test failures have been solved by other improvements between then and now.

I guess there can't be any harm in adding another commit to this that removes the NumPy version pin - if the tests still pass on Travis then I'd say we're in a good place :smile:

QuLogic commented 8 years ago

However I also ran the tests offline against NumPy v1.10 (with the NumPy version pin removed) and the tests passed then as well.

Are you sure? Travis is :unamused:.

DPeterK commented 8 years ago

Travis is :unamused:

Ahh. That's regrettable but not entirely a surprise...

QuLogic commented 8 years ago

I managed to fix std by changing the dtype of the temporary arrays to match the expected output.

I think what's wrong with test_all_masked is that the masked arrays are np.nan=-2^64 for ints which sum to something that doesn't fit in an int64 and is upcasted to float64. Should sum just not do += on self.running_total?

rhattersley commented 8 years ago

numpy (both 1.9 and 1.10) do something rather annoying:

>>> ma.sum(ma.masked_all((3,), dtype='i8')).dtype
dtype('float64')

Even explicitly asking for an i8 sum doesn't help:

>>> ma.sum(ma.masked_all((3,), dtype='i8'), dtype='i8').dtype
dtype('float64')

There is explicit code in np.ma.sum() to check for a 0-dimensional result and return np.ma.masked, a MaskedConstant with an f8 dtype. Looks like 1.10 is then more strict about trying to add a float to an int via the +=.

rhattersley commented 8 years ago

Looks like 1.10 is then more strict about trying to add a float to an int via the +=.

It gets worse. :unamused: When processing a chunk of data in _SumMaskedStreamsHandler.process_data, if everything along the axis is masked for a certain position, the result of np.sum() for that position will also be masked. But adding a masked value to a normal value gives another masked value. So masked values become "sticky" in the running total ... not what we want! On the other hand, if all the values along the axis are masked (not just within a single chunk) then the final result should be masked.

Looks like a genuine fix is needed even for 1.9.

pelson commented 8 years ago

Urgh. Ouch.

QuLogic commented 8 years ago

Any progress here? Fedora 24 is out with NumPy 1.11, and I have no packages for biggus or iris.

QuLogic commented 8 years ago

OK, I had an idea and this now works correctly. However, Travis is still installing NumPy 1.10.1, and is hitting a bug in it: numpy/numpy#6491.

QuLogic commented 8 years ago

OK, this is now 100% working. On to 1.11...

pelson commented 8 years ago

Thanks for taking this forward @QuLogic.

Given its proximity to numpy, we'd want to add another matrix item in the CI to check all supported numpy versions I think.

I'm not going to be able to give this PR the time it deserves in the next few weeks. I know that @bjlittle is also a little busy atm. - if nobody else gets around to it, I'll pick this up as a matter of priority when I get back on the 26th July.

QuLogic commented 8 years ago

This now tests against NumPy 1.7 and the latest release. I tried NumPy 1.6 (I couldn't find a minimum here, but this version is noted in Iris docs), but it fails to install.

The last commit is a fix for NumPy 1.11, so that's supported now too.

marqh commented 8 years ago

Ace work @QuLogic

this looks all ready to merge to me, and makes a biggus release well worthwhile

last call for objections before I merge:.... @pp-mo @bjlittle @pelson

pp-mo commented 8 years ago

Don't know enough to object, but it looks like a great leap forward. Really impressive work @QuLogic !