Closed FilipDominec closed 1 year ago
Implemented solution
The user script is similar to the one in official guidelines:
#!/usr/bin/python3
#-*- coding: utf-8 -*-
import multiprocessing as mp
# Following line needed for testing on Linux only: simulate how spawn behaves natively on Windows:
if __name__ == "__main__": mp.set_start_method('spawn')
import worker_module
print('process 1 name', __name__)
# PatchedProcess now does not require any extra "guard" nor indentation for the user code!
# if __name__ == '__main__': woohoo
p = worker_module.PatchedProcess(target=worker_module.worker_func, args=('bob',))
p.daemon = True
p.start()
p.join() # wait for the printout (or time.sleep(0.1) would work, too)
... except for obviously not using the "guard", replacing Process with PatchedProcess, and for storing the worker_func
in a separate module. The latter is necessary for pickle restoring the same function object in the spawned process.
The separate working_module.py contains our worker_func
for the new process, and the necessary patched subclass of multiprocessing.Process
:
#!/usr/bin/python3
#-*- coding: utf-8 -*-
import multiprocessing as mp
print('process 2 name', __name__)
class PatchedProcess(mp.Process):
def start(self, *args):
import sys
bkup_main = sys.modules['__main__']
sys.modules['__main__'] = __import__('worker_module')
super().start(*args)
sys.modules['__main__'] = bkup_main
def worker_func(name):
print('USER FUNCTION ' * 3, name)
The trick is simply in redirecting the new process to load another module, instead of default main script. Spawn bomb is avoided, functionality of the user_func
is maintained (if it does not use main script's globals) and just few lines of code are added - which however can be entirely taken care of inside your module, keeping the main script code short and clean.
Fixed by 77f5c61.
TR;DR When performance optimisation made me to introduce multiprocessing in the rp2daq module, it has broken all example scripts here. Instead of expanding the code of the examples, I demonstrate how to patch the
multiprocessing.Process
class so that no changes in user code is required.Rationale The multiprocessing module turns out necessary for fast & reliable USB communication, and its compatibility with both Windows & Linux stipulates the use of the "spawn" approach instead of "fork". This in turn allegedly requires that a part of the executed script code, excepting imports and some trivial definitions, is fully enclosed in a "guard":
.... clause, as discussed in multiprocessing module's Programming Guidelines. Otherwise spawning the secondary process would result in an infinite chain of further processes - a fork bomb, or rather a spawn bomb here.
This has been puzzling people for over 10 years without anybody actually bringing another solution: https://stackoverflow.com/questions/22595639/best-practices-for-using-multiprocessing-package-in-python?rq=4 https://stackoverflow.com/questions/48306228/how-to-stop-multiprocessing-in-python-running-for-the-full-script https://stackoverflow.com/questions/48680134/how-to-avoid-double-imports-with-the-python-multiprocessing-module https://stackoverflow.com/questions/50781216/in-python-multiprocessing-process-do-we-have-to-use-name-main https://stackoverflow.com/questions/55057957/an-attempt-has-been-made-to-start-a-new-process-before-the-current-process-has-f https://stackoverflow.com/questions/57191393/python-multiprocessing-with-start-method-spawn-doesnt-work https://stackoverflow.com/questions/70263918/how-to-prevent-multiprocessing-from-inheriting-imports-and-globals https://stackoverflow.com/questions/77220442/multiprocessing-pool-in-a-python-class-without-name-main-guard
With all due respect to the python's core module developers, I consider this necessity for the "guard" a very bad UX design decision, because it cannot be hidden in the module - instead it is propagated to all scripts that import it. It apparently breaks the principle of least astonishment, leads to repeating code, also it is ugly, complicated, and fairly hard to explain.
One of core principles of rp2daq is to resolve all necessary complexity inside its code, thus enabling the user to write very short, easy-to-read scripts as means for routine instrumentation control. Therefore, the "guard" clause is utterly unacceptable for this project.
Fortunately, a multiplatform, and arguably quite elegant solution exists.