Open SteveEckel opened 3 years ago
If instead we execute using the built in map function:
sols = list(map(generate_random_solution, range(2)))
then things work properly and changing the eqn
variable works as expected.
So it would appear that the problem is in pathos
. It is as if it does not re-pickle things when you generate the new pool.
Some additional information. It would appear that pathos.pools.ProcessPool
is persistent, so it does not close down and delete itself outside of the with
statement. Trying to close it and restart it causes serious problems, as in the code below:
tmax = 1e3
args = ([0, tmax], )
kwargs = {'t_eval':np.linspace(0, tmax, 5001),
'random_recoil':True,
'progress_bar':False,
'max_scatter_probability':0.5,}
def generate_random_solution(x):
# We need to generate random numbers to prevent solutions from being seeded
# with the same random number.
np.random.rand(256*x)
eqn.evolve_motion(*args, **kwargs)
return eqn.sol
if 'pool' in locals():
pool.restart()
else:
pool = pathos.pools.ProcessPool(nodes=4)
Natoms = 4
chunksize = 4
sols = []
progress = progressBar()
for jj in range(int(Natoms/chunksize)):
sols += pool.map(generate_random_solution, range(chunksize))
progress.update((jj+1)/int(Natoms/chunksize))
pool.close()
This code executes the first time without problem, then the second time fine but I get the following error in the terminal running Juypter: libc++abi.dylib: terminating with uncaught exception of type std::runtime_error: Couldn't close file
. After the second time it does apparently repickle all the local variables and execute with appropriate changes to eqn
, but tic marks no longer appear on matplotlib
graphs, which is an interesting side effect. Finally, the third time executing fails with repeated libc++abi.dylib: terminating with uncaught exception of type std::runtime_error: Couldn't close file
errors.
So the problem is clearly using global variables in generate_random_solution
. All the variables should be defined through function arguments, e.g. local variables. Only globally-imported packages should be used inside generate_random_solution, as these will always remain unchanged until a kernel restart. The following code seems to fix all the problems:
tmax = 1e3
args = ([0, tmax], )
kwargs = {'t_eval':np.linspace(0, tmax, 5001),
'random_recoil':True,
'progress_bar':False,
'max_scatter_probability':0.5,}
def generate_random_solution(mapargs):
"""
A function for each subprocess to execute.
Parameters
----------
mapargs : tuple
tuple of (index, eqn, args, kwargs) of the index
of the simulation, the eqn object to call evolve_motion,
args and kwargs passed to evolve_motion.
Notes
-----
All variables inside this function are defined locally, no
global variables can be used as these are not overwritten
once the function is defined and dill pickled by pathos.
This gets around the persistency problem of the ProcessPool.
"""
# Unpack the unique arguments for this function call:
x, eqn, args, kwargs = mapargs
# Use the index to define a unique starting random number
np.random.rand(256*x)
# Evolve motion:
eqn.evolve_motion(*args, **kwargs)
return eqn.sol
# Start the parallel process now, after defining generate_random_solution:
pool = pathos.pools.ProcessPool(nodes=4)
# Make a list of arguments for the generate_random_solution function
# that we will map onto the parallel pool. We copy the eqn object
# so that each process executes on a different memory subspace,
# otherwise we will have fun problems.
Natoms = 96
mapargs = []
for jj in range(Natoms):
mapargs.append((jj, copy.copy(eqn), args, kwargs))
# Map this list of tuples onto the parallel pool, in chunks so we
# can monitor progress.
chunksize = 4
sols = []
progress = progressBar()
for jj in range(int(Natoms/chunksize)):
sols += pool.map(generate_random_solution, mapargs[jj*chunksize:(jj+1)*chunksize])
progress.update((jj+1)/int(Natoms/chunksize))
This implementation needs to be updated in the F=2->F'=3 1D optical molasses example. The 3D MOT temperature example with the OBEs uses the run_single_sim.py
old version of parallel processing. That example can be updated as well.
Hi Steve, I want to follow up on this issue to see if there is an updated solution. I am looking at the recoil-limit MOT example,10_recoil_limited_MOT.ipynb
, and noticed that generate_random_solution
always returns the same eqn.sol
for all 1024 runs of the simulation. As following,
`def generate_random_solution(x, eqn=eqn, tmax=tmax):
# We need to generate random numbers to prevent solutions from being seeded
# with the same random number.
import numpy as np
np.random.rand(256*x)
eqn.evolve_motion(
[0, tmax],
t_eval=np.linspace(0, tmax, 1001),
random_recoil=True,
progress_bar=False,
max_step=1.
)
return eqn.sol
`
So, I am wondering if there is an updated version of this example. Will you recommend me using other multiprocess packages instead? Thank you!
Hi @Lufter. What you bring up is a slightly is a slightly different issue, but nonetheless an important one. I just tested 10_recoil_limited_MOT.ipynb
on my Mac with python and did indeed notice that every solution in a given chunk was indeed identical (but not every solution, as you report). This problem is related to how we are seeding the random generators in the subprocesses.
Numpy has documentation for seeding subprocess random number generators. The preferred method to pass each subprocess a uniquely initiated random number generator. Indeed, evolve_motion has an rng
keyword argument to allow that to pass, but we never updated the example to show the implementation. So, adding that to the issue described above, we get the following code:
import pathos
import copy
def generate_random_solution(mapargs):
"""
A function for each subprocess to execute.
Parameters
----------
mapargs : tuple
tuple of (eqn, args, kwargs) of the index
of the simulation, the eqn object to call evolve_motion,
args and kwargs passed to evolve_motion. Note that
`rng` must be defined in kwargs
Notes
-----
All variables inside this function are defined locally, no
global variables can be used as these are not overwritten
once the function is defined and dill pickled by pathos.
This gets around the persistency problem of the ProcessPool.
"""
# Unpack the unique arguments for this function call:
eqn, args, kwargs = mapargs
# Evolve motion:
eqn.evolve_motion(*args, **kwargs)
return eqn.sol
# Start the parallel process now, after defining generate_random_solution:
pool = pathos.pools.ProcessPool(nodes=4)
# Specify the number of atoms that we are simulated here:
Natoms = 8
# Make a random number generator and use that to seed the random
# number generators for all the subprocesses through spawn. These
# child rngs will then be passed to evolve_motion through the `rng`
# keyword argument.
rng = np.random.default_rng()
child_rngs = rng.spawn(Natoms)
# Define the common arguments and keyword arguments that will be
# passed to the equations:
args = ([0, tmax], )
kwargs = {'t_eval':np.linspace(0, tmax, 5001),
'random_recoil':True,
'progress_bar':False,
'max_scatter_probability':0.5,}
# Make a list of arguments for the generate_random_solution function
# that we will map onto the parallel pool. We copy the eqn object
# so that each process executes on a different memory subspace,
# otherwise we will have fun problems.
mapargs = []
for jj in range(Natoms):
mapargs.append((copy.copy(eqn), args, {**kwargs, 'rng':child_rngs[jj]}))
# Map this list of tuples onto the parallel pool, in chunks so we
# can monitor progress.
chunksize = 4
sols = []
progress = progressBar()
for jj in range(int(Natoms/chunksize)):
sols += pool.map(generate_random_solution, mapargs[jj*chunksize:(jj+1)*chunksize])
progress.update((jj+1)/int(Natoms/chunksize))
If you copy and paste this code into the appropriate cell in the example (replacing the initial, problematic code), it should work and give you eight different solutions. If it doesn't, let us know and we'll try to diagnose the problem further.
We have been implementing parallel processing using pathos, using code like this
The problem is that it does not actually rewrite
eqn
that i s being evolved ingenerate_random_solution
iseqn
is changed in the main program.