FilipDominec / rp2daq

Raspberry Pi Pico firmware for universal hardware control & measurement, along with a user-friendly Python frontend
MIT License
27 stars 4 forks source link

Most examples currently don't work, due to multiprocessing #9

Closed FilipDominec closed 1 year ago

FilipDominec commented 1 year ago

TR;DR When performance optimisation made me to introduce multiprocessing in the rp2daq module, it has broken all example scripts here. Instead of expanding the code of the examples, I demonstrate how to patch the multiprocessing.Process class so that no changes in user code is required.

Rationale The multiprocessing module turns out necessary for fast & reliable USB communication, and its compatibility with both Windows & Linux stipulates the use of the "spawn" approach instead of "fork". This in turn allegedly requires that a part of the executed script code, excepting imports and some trivial definitions, is fully enclosed in a "guard":

if __name__ == "__main__": ...

.... clause, as discussed in multiprocessing module's Programming Guidelines. Otherwise spawning the secondary process would result in an infinite chain of further processes - a fork bomb, or rather a spawn bomb here.

This has been puzzling people for over 10 years without anybody actually bringing another solution: https://stackoverflow.com/questions/22595639/best-practices-for-using-multiprocessing-package-in-python?rq=4 https://stackoverflow.com/questions/48306228/how-to-stop-multiprocessing-in-python-running-for-the-full-script https://stackoverflow.com/questions/48680134/how-to-avoid-double-imports-with-the-python-multiprocessing-module https://stackoverflow.com/questions/50781216/in-python-multiprocessing-process-do-we-have-to-use-name-main https://stackoverflow.com/questions/55057957/an-attempt-has-been-made-to-start-a-new-process-before-the-current-process-has-f https://stackoverflow.com/questions/57191393/python-multiprocessing-with-start-method-spawn-doesnt-work https://stackoverflow.com/questions/70263918/how-to-prevent-multiprocessing-from-inheriting-imports-and-globals https://stackoverflow.com/questions/77220442/multiprocessing-pool-in-a-python-class-without-name-main-guard

With all due respect to the python's core module developers, I consider this necessity for the "guard" a very bad UX design decision, because it cannot be hidden in the module - instead it is propagated to all scripts that import it. It apparently breaks the principle of least astonishment, leads to repeating code, also it is ugly, complicated, and fairly hard to explain.

One of core principles of rp2daq is to resolve all necessary complexity inside its code, thus enabling the user to write very short, easy-to-read scripts as means for routine instrumentation control. Therefore, the "guard" clause is utterly unacceptable for this project.

Fortunately, a multiplatform, and arguably quite elegant solution exists.

FilipDominec commented 1 year ago

Implemented solution

The user script is similar to the one in official guidelines:

#!/usr/bin/python3  
#-*- coding: utf-8 -*-

import multiprocessing as mp

# Following line needed for testing on Linux only: simulate how spawn behaves natively on Windows:
if __name__ == "__main__": mp.set_start_method('spawn') 

import worker_module

print('process 1 name', __name__)

# PatchedProcess now does not require any extra "guard" nor indentation for the user code!     
# if __name__ == '__main__': woohoo
p = worker_module.PatchedProcess(target=worker_module.worker_func, args=('bob',))
p.daemon = True
p.start()
p.join() # wait for the printout (or time.sleep(0.1) would work, too)

... except for obviously not using the "guard", replacing Process with PatchedProcess, and for storing the worker_func in a separate module. The latter is necessary for pickle restoring the same function object in the spawned process.

The separate working_module.py contains our worker_func for the new process, and the necessary patched subclass of multiprocessing.Process:

#!/usr/bin/python3  
#-*- coding: utf-8 -*-

import multiprocessing as mp

print('process 2 name', __name__)

class PatchedProcess(mp.Process):
    def start(self, *args):
        import sys
        bkup_main = sys.modules['__main__']
        sys.modules['__main__'] =  __import__('worker_module')
        super().start(*args)
        sys.modules['__main__'] = bkup_main

def worker_func(name):
    print('USER FUNCTION ' * 3, name)

The trick is simply in redirecting the new process to load another module, instead of default main script. Spawn bomb is avoided, functionality of the user_func is maintained (if it does not use main script's globals) and just few lines of code are added - which however can be entirely taken care of inside your module, keeping the main script code short and clean.

Fixed by 77f5c61.