SciTools / biggus

:no_entry: [DEPRECATED] Virtual large arrays and lazy evaluation.
http://biggus.readthedocs.io/
GNU Lesser General Public License v3.0
54 stars 27 forks source link

Thread-per-node stream processing #63

Closed rhattersley closed 10 years ago

rhattersley commented 10 years ago

Builds on #62.

Removes the axis == 0 limitation.

pelson commented 10 years ago

Woops - test no longer applicable:

======================================================================

FAIL: test_non_zero (biggus.tests.unit.test_std_var.TestInvalidAxis)

----------------------------------------------------------------------

Traceback (most recent call last):

File "/home/travis/build/SciTools/biggus/biggus/tests/unit/test_std_var.py", line 40, in test_non_zero

func(self.array, axis=1)

AssertionError: AssertionError not raised
rhattersley commented 10 years ago

@pelson - I've added commits and/or PR comments above which tackle all your comments except splitting biggus into separate modules. That'll take some head scratching, so might be best placed in a separate PR.

pelson commented 10 years ago

I don't necessarily agree that having a threaded engine is easy to debug. I found the following issue:

b = biggus.NumpyArrayAdapter(np.arange(24).reshape(3, 4, 2))
print biggus.ndarrays([biggus.mean(biggus.mean(b, axis=1), axis=-1)])
Exception in thread <biggus.ProducerNode object at 0x215ee90>:
Traceback (most recent call last):
  File "lib/python2.7/threading.py", line 552, in __bootstrap_inner
    self.run()
  File "lib/python2.7/threading.py", line 505, in run
    self.__target(*self.__args, **self.__kwargs)
  File "biggus/biggus/__init__.py", line 185, in run
    i in range(len(self.iteration_order))]
ValueError: 0 is not in list

Exception in thread <biggus.StreamsHandlerNode object at 0x215ee10>:
Traceback (most recent call last):
  File "python2.7/threading.py", line 552, in __bootstrap_inner
    self.run()
  File "python2.7/threading.py", line 505, in run
    self.__target(*self.__args, **self.__kwargs)
  File "biggus/biggus/__init__.py", line 264, in run
    self.output(self.finalise())
  File "biggus/biggus/__init__.py", line 282, in finalise
    return self.streams_handler.finalise()
  File "biggus/biggus/__init__.py", line 1096, in finalise
    array = self.running_total / self.array.shape[self.axis]
AttributeError: '_MeanStreamsHandler' object has no attribute 'running_total'

Exception in thread <biggus.StreamsHandlerNode object at 0x215ed50>:
Traceback (most recent call last):
  File "/data/local/sci/r28/lib/python2.7/threading.py", line 552, in __bootstrap_inner
    self.run()
  File "/data/local/sci/r28/lib/python2.7/threading.py", line 505, in run
    self.__target(*self.__args, **self.__kwargs)
  File "/data/local/itpe/git/biggus/biggus/__init__.py", line 264, in run
    self.output(self.finalise())
  File "/data/local/itpe/git/biggus/biggus/__init__.py", line 282, in finalise
    return self.streams_handler.finalise()
  File "/data/local/itpe/git/biggus/biggus/__init__.py", line 1096, in finalise
    array = self.running_total / self.array.shape[self.axis]
AttributeError: '_MeanStreamsHandler' object has no attribute 'running_total'

Luckily the error message is pretty good, as the traceback is pretty tricky to follow.

pelson commented 10 years ago

Ok. Once that is fixed (and all related StreamsHander definitions are explicitly definied), I'm getting close to merging.

:+1:

rhattersley commented 10 years ago

all related StreamsHander definitions are explicitly definied

Do you mean: docstrings for the _StreamsHandler class and all its sub-classes?

pelson commented 10 years ago

Do you mean: docstrings for the _StreamsHandler class and all its sub-classes?

No. I mean that any instance attributes which are used in finalise should be defined in __init__ - I believe that is the reason for the traceback above.

rhattersley commented 10 years ago

Ah, I see. Actually, the error message concerning running_total was a distraction. The real/original error was ValueError: 0 is not in list triggered by the use of a negative axis. I've pushed a fix for that already.

Getting rid of the "distraction" errors should be done by avoiding running the finalise code at all when an error has occurred. I'm happy to have a stab at that too....

pelson commented 10 years ago

Getting rid of the "distraction" errors should be done by avoiding running the finalise code at all when an error has occurred. I'm happy to have a stab at that too....

I think I just proved the problem with the error message interpretation with threaded execution - I know there isn't a lot we can do about it, but it is worth remembering - it will bite somebody at some point.

rhattersley commented 10 years ago

I've just pushed a couple of commits which:

rhattersley commented 10 years ago

:tada: :clap:

@pelson - thanks for helping to knock this PR into shape. Yes, there are still some issues to be addressed (e.g. splitting into multiple files and figuring out the relationship between the Array, Handler, and Engine classes) but it's a big step forwards to be able to get rid of the axis==0 limitation.