Closed QuLogic closed 8 years ago
The test_simple (biggus.tests.unit.test_BroadcastArray.Test_masked_array)
test looks like it could be a bug with stride tools and masked arrays, but I haven't had a chance to look if I can reproduce it without biggus. The other two have poor messages, are there any warnings coming out at the same time?
Ah, good point about the warnings; this might clear things up a bit:
test_getitem (biggus.tests.test_adapter.TestNumpyAdapter) ...
.../biggus/__init__.py:1450: VisibleDeprecationWarning: boolean index did not match indexed array along dimension 0; dimension is 3 but corresponding boolean dimension is 2
array = self.concrete.__getitem__(keys)
.../biggus/__init__.py:1450: VisibleDeprecationWarning: boolean index did not match indexed array along dimension 1; dimension is 4 but corresponding boolean dimension is 3
array = self.concrete.__getitem__(keys)
ok
test_getitem (biggus.tests.test_adapter.TestOrthoAdapter) ...
.../biggus/tests/test_adapter.py:201: VisibleDeprecationWarning: boolean index did not match indexed array along dimension 0; dimension is 3 but corresponding boolean dimension is 2
result = result.__getitem__(tuple(index))
.../biggus/tests/test_adapter.py:201: VisibleDeprecationWarning: boolean index did not match indexed array along dimension 1; dimension is 4 but corresponding boolean dimension is 3
result = result.__getitem__(tuple(index))
ok
test_flat (biggus.tests.unit.test_std_var.TestNumpyArrayAdapterMasked) ... Exception in thread <biggus.StreamsHandlerNode object at 0x7fc52d755ad0>:
Traceback (most recent call last):
File "/usr/lib64/python2.7/threading.py", line 813, in __bootstrap_inner
self.run()
File "/usr/lib64/python2.7/threading.py", line 766, in run
self.__target(*self.__args, **self.__kwargs)
File ".../biggus/__init__.py", line 290, in run
self.output(self.process_chunks(input_chunks))
File ".../biggus/__init__.py", line 320, in process_chunks
return self.streams_handler.process_chunks(chunks)
File ".../biggus/__init__.py", line 2110, in process_chunks
self.process_data(chunk.data)
File ".../biggus/__init__.py", line 2337, in process_data
temp /= self.k
TypeError: ufunc 'true_divide' output (typecode 'd') could not be coerced to provided output parameter (typecode 'l') according to the casting rule ''same_kind''
ok
test_flat (biggus.tests.unit.test_sum.TestNumpyArrayAdapterMasked) ... Exception in thread <biggus.StreamsHandlerNode object at 0x7fc52d74d690>:
Traceback (most recent call last):
File "/usr/lib64/python2.7/threading.py", line 813, in __bootstrap_inner
self.run()
File "/usr/lib64/python2.7/threading.py", line 766, in run
self.__target(*self.__args, **self.__kwargs)
File ".../biggus/__init__.py", line 290, in run
self.output(self.process_chunks(input_chunks))
File ".../biggus/__init__.py", line 320, in process_chunks
return self.streams_handler.process_chunks(chunks)
File ".../biggus/__init__.py", line 2110, in process_chunks
self.process_data(chunk.data)
File ".../biggus/__init__.py", line 2239, in process_data
self.running_total += np.sum(data, axis=self.axis)
File ".../numpy/ma/core.py", line 3966, in __iadd__
getdata(other)))
TypeError: Cannot cast ufunc add output from dtype('float64') to dtype('int64') with casting rule 'same_kind'
ok
Looks like some size and type mismatches.
So I'm guessing the first 4 warnings are caused by these tests:
# Boolean indexing (too few indices - zero pad)
[(3, 4), [np.array([1, 1], dtype=bool)], (2, 4)],
[(3, 4), [(slice(None), np.array([1, 1, 1], dtype=bool))], (3, 3)],
but I believe these are soon to be invalid in NumPy. Should they be removed and/or deprecated, or the biggus type changed to expand the index to fit?
The other two are of course due to the in-place operations. But I assume those are there for performance reasons. Would it make sense to always start with a float64 array if possible?
test_simple
has been fixed at some point before 0.13.0.
Reminder (to myself as much as anyone) that a pin was put in the numpy version: https://github.com/SciTools/biggus/pull/173/files#diff-b4ef698db8ca845e5845c4618278f29aR1
Ping? I think NumPy 1.10 is common enough now, especially since 1.11 was already released.
And the tests now all pass :sunglasses:
EDIT: of course, the tests are all passing because of that pin in the NumPy version...
Yea, this is a ping for @pelson or others, because I don't understand the internals of biggus enough to fix the remaining problems.
However I also ran the tests offline against NumPy v1.10 (with the NumPy version pin removed) and the tests passed then as well. It is possible that the issues previously causing the test failures have been solved by other improvements between then and now.
I guess there can't be any harm in adding another commit to this that removes the NumPy version pin - if the tests still pass on Travis then I'd say we're in a good place :smile:
However I also ran the tests offline against NumPy v1.10 (with the NumPy version pin removed) and the tests passed then as well.
Are you sure? Travis is :unamused:.
Travis is :unamused:
Ahh. That's regrettable but not entirely a surprise...
I managed to fix std
by changing the dtype of the temporary arrays to match the expected output.
I think what's wrong with test_all_masked
is that the masked arrays are np.nan=-2^64
for ints which sum to something that doesn't fit in an int64 and is upcasted to float64. Should sum
just not do +=
on self.running_total
?
numpy (both 1.9 and 1.10) do something rather annoying:
>>> ma.sum(ma.masked_all((3,), dtype='i8')).dtype
dtype('float64')
Even explicitly asking for an i8 sum doesn't help:
>>> ma.sum(ma.masked_all((3,), dtype='i8'), dtype='i8').dtype
dtype('float64')
There is explicit code in np.ma.sum()
to check for a 0-dimensional result and return np.ma.masked
, a MaskedConstant
with an f8 dtype. Looks like 1.10 is then more strict about trying to add a float to an int via the +=
.
Looks like 1.10 is then more strict about trying to add a float to an int via the +=.
It gets worse. :unamused: When processing a chunk of data in _SumMaskedStreamsHandler.process_data
, if everything along the axis is masked for a certain position, the result of np.sum()
for that position will also be masked. But adding a masked value to a normal value gives another masked value. So masked values become "sticky" in the running total ... not what we want! On the other hand, if all the values along the axis are masked (not just within a single chunk) then the final result should be masked.
Looks like a genuine fix is needed even for 1.9.
Urgh. Ouch.
Any progress here? Fedora 24 is out with NumPy 1.11, and I have no packages for biggus or iris.
OK, I had an idea and this now works correctly. However, Travis is still installing NumPy 1.10.1, and is hitting a bug in it: numpy/numpy#6491.
OK, this is now 100% working. On to 1.11...
Thanks for taking this forward @QuLogic.
Given its proximity to numpy, we'd want to add another matrix item in the CI to check all supported numpy versions I think.
I'm not going to be able to give this PR the time it deserves in the next few weeks. I know that @bjlittle is also a little busy atm. - if nobody else gets around to it, I'll pick this up as a matter of priority when I get back on the 26th July.
This now tests against NumPy 1.7 and the latest release. I tried NumPy 1.6 (I couldn't find a minimum here, but this version is noted in Iris docs), but it fails to install.
The last commit is a fix for NumPy 1.11, so that's supported now too.
Ace work @QuLogic
this looks all ready to merge to me, and makes a biggus release well worthwhile
last call for objections before I merge:.... @pp-mo @bjlittle @pelson
Don't know enough to object, but it looks like a great leap forward. Really impressive work @QuLogic !
Some tests do not pass with NumPy 1.10. Most of these are due to invalid dtype strings, which are fixed here. There are three more failures, but they are a bit too deep into the internals for me to understand:
Any idea for fixing up these last three would be appreciated.