chrthomsen / pygrametl

Official repository for pygrametl - ETL programming in Python
http://pygrametl.org
BSD 2-Clause "Simplified" License
289 stars 41 forks source link

Specify which platforms can run parallel ETL flows #55

Closed skejserjensen closed 1 year ago

skejserjensen commented 1 year ago

This PR closes #46 and #47 by documenting which platforms can execute parallel ETL flows implemented pygrametl. Currently, pygrametl supports executing parallel ETL flows using CPython on platforms that start new processes using fork and Jython. Thus, executing a parallel ETL flow natively on Microsoft Windows using CPython is not supported, and macOS must be configured to use fork using multiprocessing.set_start_method('fork') due to the issues with macOS's fork implementation documented in CPython Issue 77906 (Thanks to @mFeigeInvia). An attempt to support spawn was made, however, it became clear that this would require major changes to pygrametl. This is primarily due to limitations of pickle and additional requirements when using spawn or forkserver compared to fork. As CPython generally does not perform well when executing parallel ETL flows compared to Jython, @chrthomsen, @fromm1990, and I agreed to prioritize other improvements to pygrametl.