Multiprocessor task spawn runtime error on Mac OS & Python 3.11 #66

Open rachel3834 opened 1 year ago

rachel3834 commented 1 year ago

When attempting to run a standard DE fitting process, e.g.:

fit_1 = DE_fit.DEfit(pspl)

Python raises RunTimeErrors stemming from the multiprocessing library, which complains that the code is using 'spawn' to start the parallel tasks (rather than 'fork'). Note that this error is seen only under the following OS/Python versions: OS: MacOS Ventura 13.4.1 Python: 3.11

This appears to relate this known issue in the multiprocessing library when running on MacOS.

When running a multiprocessing pool under Linux, the library uses a fork method but under MacOS it defaults to a spawn process, which introduces some additional constraints in the permitted configurations for the process.
More detailed information about this is available here.

Full traceback:

check_event : Everything looks fine... check_event : Everything looks fine... Traceback (most recent call last): File "", line 1, in File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/multiprocessing/spawn.py", line 120, in spawn_main exitcode = _main(fd, parent_sentinel) ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/multiprocessing/spawn.py", line 129, in _main prepare(preparation_data) File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/multiprocessing/spawn.py", line 240, in prepare _fixup_main_from_path(data['init_main_from_path']) File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/multiprocessing/spawn.py", line 291, in _fixup_main_from_path main_content = runpy.run_path(main_path, ^^^^^^^^^^^^^^^^^^^^^^^^^ File "", line 291, in run_path File "", line 98, in _run_module_code File "", line 88, in _run_code File "/Users/rstreet1/software/microlensing_project/python_scripts/gaia_pspl+usbl_noparallax/gaia21bsg_binary_noparallax_usbl.py", line 913, in fit_1 = DE_fit.DEfit(pspl) ^^^^^^^^^^^^^^^^^^ File "/Users/rstreet1/software/pylima_venv3_11/lib/python3.11/site-packages/pyLIMA/fits/DE_fit.py", line 28, in init super().init(model, rescale_photometry=rescale_photometry, File "/Users/rstreet1/software/pylima_venv3_11/lib/python3.11/site-packages/pyLIMA/fits/ML_fit.py", line 67, in init self.trials = Manager().list() # to be recognize by all process during ^^^^^^^^^ File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/multiprocessing/context.py", line 57, in Manager m.start() File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/multiprocessing/managers.py", line 563, in start self._process.start() File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/multiprocessing/process.py", line 121, in start self._popen = self._Popen(self) ^^^^^^^^^^^^^^^^^ File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/multiprocessing/context.py", line 288, in _Popen return Popen(process_obj) ^^^^^^^^^^^^^^^^^^ File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/multiprocessing/popen_spawn_posix.py", line 32, in init super().init(process_obj) File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/multiprocessing/popen_fork.py", line 19, in init self._launch(process_obj) File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/multiprocessing/popen_spawn_posix.py", line 42, in _launch prep_data = spawn.get_preparation_data(process_obj._name) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/multiprocessing/spawn.py", line 158, in get_preparation_data _check_not_importing_main() File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/multiprocessing/spawn.py", line 138, in _check_not_importing_main raise RuntimeError(''' RuntimeError: An attempt has been made to start a new process before the current process has finished its bootstrapping phase.

    This probably means that you are not using fork to start your
    child processes and you have forgotten to use the proper idiom
    in the main module:

        if __name__ == '__main__':

    The "freeze_support()" line can be omitted if the program
    is not going to be frozen to produce an executable.

Traceback (most recent call last): File "/Users/rstreet1/software/microlensing_project/python_scripts/gaia_pspl+usbl_noparallax/gaia21bsg_binary_noparallax_usbl.py", line 913, in fit_1 = DE_fit.DEfit(pspl) ^^^^^^^^^^^^^^^^^^ File "/Users/rstreet1/software/pylima_venv3_11/lib/python3.11/site-packages/pyLIMA/fits/DE_fit.py", line 28, in init super().init(model, rescale_photometry=rescale_photometry, File "/Users/rstreet1/software/pylima_venv3_11/lib/python3.11/site-packages/pyLIMA/fits/ML_fit.py", line 67, in init self.trials = Manager().list() # to be recognize by all process during ^^^^^^^^^ File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/multiprocessing/context.py", line 57, in Manager m.start() File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/multiprocessing/managers.py", line 567, in start self._address = reader.recv() ^^^^^^^^^^^^^ File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/multiprocessing/connection.py", line 249, in recv buf = self._recv_bytes() ^^^^^^^^^^^^^^^^^^ File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/multiprocessing/connection.py", line 413, in _recv_bytes buf = self._recv(4) ^^^^^^^^^^^^^ File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/multiprocessing/connection.py", line 382, in _recv raise EOFError EOFError

Process finished with exit code 1

ronaldoussoren commented 10 months ago

The problem is that multiprocessing on macOS and windows uses the "spawn" method to launch worker processes and that has some requirements on how to use the API, in particular that the main module can be imported safely (see the entry on "safe importing of main module" in https://docs.python.org/3/library/multiprocessing.html#the-spawn-and-forkserver-start-methods).

Your examples start working when you place most of the script code in if __name__ == "__main__": blocks, see for example this changed version of example_1.py:

Welcome to pyLIMA (v2) tutorial 1!

In this tutorial you will learn how pyLIMA works by fitting a simulated data set.
We will cover how to read in data files, call different fitting routines and how to
make plots.
Please take some time to familiarize yourself with the pyLIMA documentation.
### Import the required libraries.
import matplotlib.pyplot as plt
import numpy as np
from matplotlib.colors import LogNorm
from pyLIMA.fits import DE_fit
from pyLIMA.fits import LM_fit
from pyLIMA.fits import MCMC_fit
from pyLIMA.models import FSPL_model
from pyLIMA.models import PSPL_model
from pyLIMA.outputs import pyLIMA_plots

from pyLIMA import event
from pyLIMA import telescopes

if __name__ == "__main__":
    ### Create a new EVENT object and give it a name.
    your_event = event.Event()
    your_event.name = 'My event name'

    ### You now need to associate some data sets with this EVENT. 
    ### For this example, you will use simulated I-band data sets from two telescopes,
    # OGLE and LCO.
    ### The data sets are pre-formatted: column 1 is the date, column 2 the magnitude and
    # column 3
    ### the uncertainty in the magnitude.
    data_1 = np.loadtxt('./data/Survey_1.dat')
    telescope_1 = telescopes.Telescope(name='OGLE',
                                       light_curve_names=['time', 'mag', 'err_mag'],
                                       light_curve_units=['JD', 'mag', 'mag'])

    data_2 = np.loadtxt('./data/Followup_1.dat')
    telescope_2 = telescopes.Telescope(name='LCO',
                                       light_curve_names=['time', 'mag', 'err_mag'],
                                       light_curve_units=['JD', 'mag', 'mag'])

    ### Append these two telescope data sets to your EVENT object.

    ### Define the survey telescope that you want to use to align all other data sets to.
    ### We recommend using the data set with the most measurements covering the gretest 
    ### time span of observations:

    ### Run a quick sanity check on your input.

    ### Next, construct the MODEL you want to fit and link it to the EVENT you prepared. 
    ### Let's go with a basic PSPL, without second order effects:

    pspl = PSPL_model.PSPLmodel(your_event)

    ### Let's try fitting the event with a simple Levenvberg_Marquardt (LM) algorithm.
    ### Define the FITTING ALGORITHM you want to use for the MODEL you prepared.
    ### For more information about the models and fitting algorithms available  
    ### please consult the pyLIMA documentation.

    ### Initialize the fit by declaring a simple FIT object using the MODEL you defined:
    my_fit = LM_fit.LMfit(pspl)

    ### Before we run it, let's have a look at the initial fit parameters:

    ### Now fit the MODEL to the EVENT. This may take a few seconds.

    ### You can now recall the fit results on the screen by executing:

    ### You can now recall any entry in the output dictionary by using the appropriate key.
    ### For example, if you want to see the best fit results, you can access them like this:

    ### If you don't remember which parameter each entry represents, you can always
    # access the
    ### descriptions from fit_parameters.

    ### Let's see some plots. Import the pyLIMA plotting tools.

    pyLIMA_plots.plot_lightcurves(pspl, my_fit.fit_results['best_model'])

    ### Let's try another fit with the differential evolution (DE) algorithm. 
    ### This will take longer... 

    my_fit2 = DE_fit.DEfit(pspl)

    ### Look at the results:
    pyLIMA_plots.plot_lightcurves(pspl, my_fit2.fit_results['best_model'])

    ### You can use the Zoom-in function to look at the peak.
    ### There is strong evidence of finite source effects in this event, so let's try to
    # fit this.
    ### You will need to import the FSPL MODEL to do this:

    fspl = FSPL_model.FSPLmodel(your_event)

    ### You can still use the FITTING ALGORITHM that you imported previously. 
    ### Let's just use DE_fit for this:
    my_fit3 = DE_fit.DEfit(fspl)

    ### Let's see some plots. You can zoom close to the peak to see what is going on.
    pyLIMA_plots.plot_lightcurves(fspl, my_fit3.fit_results['best_model'])

    ### There is evidently still some structure in the residuals. Could be some limb
    # darkening going on!
    ### Let's try to fit for it.

    ### Set the microlensing limb-darkening coefficients (gamma) for each telescope:
    your_event.telescopes[0].ld_gamma = 0.5
    your_event.telescopes[1].ld_gamma = 0.5

    ### Fit again:
    my_fit4 = DE_fit.DEfit(fspl)

    ### And plot it. Then zoom at the peak again.
    pyLIMA_plots.plot_lightcurves(fspl, my_fit4.fit_results['best_model'])

    ### You can use the results of a previous good fit as initial guesses 
    ### for the parameters in another fit:
    guess_parameters = my_fit4.fit_results['best_model']

    ### These parameter guesses can now be used to start an MCMC run, for example.
    ### Using MCMC is recommended when you want to explore the posterior distribution of
    # the parameters.
    ### Let's fit again using MCMC. This might take some time ...

    my_fit5 = MCMC_fit.MCMCfit(fspl)
    my_fit5.model_parameters_guess = guess_parameters

    ### Now your MCMC run is complete. Congratulations! 
    ### You can now plot the chains and explore how they evolve for each parameter.
    ### For example, to see how the chains for u0 evolve, do:
    plt.plot(my_fit5.fit_results['MCMC_chains'][:, :, 1])

    ### The first part in the slice [:,:,1] represents the iteration number, the second
    # the chain number
    ### and the last represents the parameter number (in addition to the likelihood at
    # the end).
    ### The parameters are in the same order as in my_fit5.fit_parameters.keys()

    ### You can compare the MCMC distributions with the input values that were used to
    # generate the light curve.
    ### For this, let's only consider the chains after the 1000th iteration (i.e. after
    # burn-in).
    ### [:7] at the end is just so only the first 7 digits are printed.
    MCMC_results = my_fit5.fit_results['MCMC_chains']
    print('Parameters', ' Model', '   Fit', '     Errors')
    print('t_0:', '        79.9309 ', str(np.median(MCMC_results[1000:, :, 0]))[:7], '',
          str(np.std(MCMC_results[1000:, :, 0]))[:7])
    print('u_0:', '        0.00826 ', str(np.median(MCMC_results[1000:, :, 1]))[:7], '',
          str(np.std(MCMC_results[1000:, :, 1]))[:7])
    print('t_E:', '        10.1171 ', str(np.median(MCMC_results[1000:, :, 2]))[:7], '',
          str(np.std(MCMC_results[1000:, :, 2]))[:7])
    print('rho:', '        0.02268 ', str(np.median(MCMC_results[1000:, :, 3]))[:7], '',
          str(np.std(MCMC_results[1000:, :, 3]))[:7])

    ### You can now plot the correlation between any two parameters.
    ### Import the relevant libraries:

    ### Now plot u0 against tE:
    plt.hist2d(MCMC_results[1000:, :, 1].ravel(), MCMC_results[1000:, :, 2].ravel(),
               norm=LogNorm(), bins=50)

    ### You can consult the matplotlib.pyplot.hist2d documentation to see additional
    # arguments.

    ### This concludes tutorial 1.