BBN-Q / Auspex

Automated system for python-based experiments
Apache License 2.0
31 stars 14 forks source link

Multiprocessing outlook dire for platforms (i.e. Windows) using spawn instead of fork #218

Closed dieris closed 6 years ago

dieris commented 6 years ago

Initializing an X6 gives the following error:

c:\users\qlab_user\documents\github\auspex\src\auspex\exp_factory.py in init_instruments(self)
     81             self.dig_listeners[mp.Process(target=dig.receive_data, args=(chan, oc, exit))] = exit
     82         for listener in self.dig_listeners.keys():
---> 83             listener.start()
     84         if self.cw_mode:
     85             for awg in self.awgs:

C:\Users\qlab_user\Anaconda3\envs\pyqt5\lib\multiprocessing\process.py in start(self)
    103                'daemonic processes are not allowed to have children'
    104         _cleanup()
--> 105         self._popen = self._Popen(self)
    106         self._sentinel = self._popen.sentinel
    107         _children.add(self)

C:\Users\qlab_user\Anaconda3\envs\pyqt5\lib\multiprocessing\context.py in _Popen(process_obj)
    221     @staticmethod
    222     def _Popen(process_obj):
--> 223         return _default_context.get_context().Process._Popen(process_obj)
    224 
    225 class DefaultContext(BaseContext):

C:\Users\qlab_user\Anaconda3\envs\pyqt5\lib\multiprocessing\context.py in _Popen(process_obj)
    320         def _Popen(process_obj):
    321             from .popen_spawn_win32 import Popen
--> 322             return Popen(process_obj)
    323 
    324     class SpawnContext(BaseContext):

C:\Users\qlab_user\Anaconda3\envs\pyqt5\lib\multiprocessing\popen_spawn_win32.py in __init__(self, process_obj)
     63             try:
     64                 reduction.dump(prep_data, to_child)
---> 65                 reduction.dump(process_obj, to_child)
     66             finally:
     67                 set_spawning_popen(None)

C:\Users\qlab_user\Anaconda3\envs\pyqt5\lib\multiprocessing\reduction.py in dump(obj, file, protocol)
     58 def dump(obj, file, protocol=None):
     59     '''Replacement for pickle.dump() using ForkingPickler.'''
---> 60     ForkingPickler(file, protocol).dump(obj)
     61 
     62 #

TypeError: can't pickle h5py.h5f.FileID objects
grahamrow commented 6 years ago

I'm stymied as to why the X6 is getting an h5 file descriptor... which branch are you on, exactly?

dieris commented 6 years ago

good question... multiprocessing-with-queue. Is that the most recent?

dieris commented 6 years ago

in auspex_dummy_mode, it bails at the same point, but with this error:

Can't pickle <class 'unittest.mock.MagicMock'>: it's not the same object as unittest.mock.MagicMock
grahamrow commented 6 years ago

Okay, I can duplicate this on macOS if I create subprocesses with spawn rather than fork. This doesn't bode well for windows, which can't use the latter. I'll see what can be done — wish I'd seen this coming.

grahamrow commented 6 years ago

I tried using pathos.multiprocess which is a fork of multiprocessing that uses dill to serialize rather than pickle, but immediately ran into problems with how it handles Event() constructors.

Another approach I figured @dellard might weight in on is having two alternative sets of imports that provide either Process and process-safe Queues and Events or Threads and thread-safe Queues and Events. This way we're mostly just shuffling imports, since it seems like the APIs are essentially the same.

dellard commented 6 years ago

I thought there was a migration away from Windows underway. Is this not true? (if it isn't true, then I guess I need a way to test things on Windows: which flavor of Windows do I need?)

From what I've read, the issue that most people run into with spawn vs fork is that with the latter, imports are imported once, and with the former, each new Python process is started independently and so the imports are done for each. If we still have code that does stuff at import-time then maybe it's colliding.

Using spawn also requires things to be pickled that aren't pickled in a 'fork'-based multiprocessing world. It makes sense that FileID is something that would be challenging to pickle, but maybe we can work around it: could we just pass the filename around, instead of the FileID?

grahamrow commented 6 years ago

I don't think we want to completely abandon Windows since we do have some external users, and it would be a shame to completely preclude the possibility.

You're right that the FileID could be fixed, but the larger problem is that our various objects contain references to each other. The question of why an X6 is having problems with a FileID is, I believe, because we pass a Stream to the X6 driver, which contains a reference to the InputConnector on a file writer, which contains a reference to the I/O Filter, which contains the FileID. Python tries to pickle all of this since we were passing the 'stream'.

So we can acquiesce and pass the Queue directly to avoid this issue, turning the receive_data method into a @staticmethod. This approach becomes very difficult in Filter objects since we've typically perform a lot of initializations that end up as instance variables. We'd have to get rid of a whole mess of self. references to make things work.

dellard commented 6 years ago

I pushed a fix for this to the head of feature/multiprocessing-with-queue (at c46f93f0d6cff5d29fca7a42f8ecc43e8b99d9a4)

The basic approach, as suggested by @grahamrow, is to load different modules as the same name on different platforms, i.e. on platforms that use fork() we use multiprocessing, and on other platforms (or when the user sets the NOFORKING environment variable) we use threading, since the interfaces for these are similar enough that we can use either one (as long as we're consistent).

It seems to work, but it needs careful review.

dieris commented 6 years ago

I'm afraid to report a new Windows error...

Exception in thread Thread-17:
Traceback (most recent call last):
  File "C:\Users\qlab_user\Anaconda3\envs\pyqt5\lib\threading.py", line 916, in _bootstrap_inner
    self.run()
  File "C:\Users\qlab_user\Anaconda3\envs\pyqt5\lib\threading.py", line 864, in run
    self._target(*self._args, **self._kwargs)
  File "c:\users\qlab_user\documents\github\auspex\src\auspex\instruments\X6.py", line 275, in receive_data
    buf = sock.recv(msg_size, socket.MSG_WAITALL)
OSError: [WinError 10045] The attempted operation is not supported for the type of object referenced
dellard commented 6 years ago

OK; more proof that I need to finish resurrecting my Windows box to do real testing...

The good news is that this particular incompatibility has a known workaround, which I will add.

dieris commented 6 years ago

Writer processes keep running when averagers and plotters are finished.

grahamrow commented 6 years ago

I'm overhauling the writer to deal with adaptive sweeping, part of which was making the doneness more robust. Will advise.

grahamrow commented 6 years ago

Okay I've pushed up fixes that have all the unit tests passing on macOS (including adaptive sweeps). I'm going to spin up a virtual machine in order to check the windows side of things.

grahamrow commented 6 years ago

This appears to be resolved, including in changes of PR #280