adrn / thejoker

A custom Monte Carlo sampler for the (gravitational) two-body problem
MIT License
26 stars 7 forks source link

Theano compilelock issues with MultiPool #105

Open adrn opened 4 years ago

adrn commented 4 years ago

See email by Song Wang:

with schwimmbad.MultiPool() as pool:
        joker_mcmc = tj.TheJoker(prior_mcmc, pool=pool, random_state=rnd)
        mcmc_init = joker_mcmc.setup_mcmc(data, samples)

with schwimmbad.MultiPool() as pool:
       joker = tj.TheJoker(prior, pool=pool, random_state=rnd)
       prior_samples = prior.sample(size=10000,random_state=rnd)
       samples = joker.rejection_sample(data, prior_samples, max_posterior_samples=256)

throw a warning:

"INFO (theano.gof.compilelock): Waiting for existing lock by process '' (I am process '') INFO (theano.gof.compilelock): To manually release the lock, delete ***/lock_dir"

dfm commented 4 years ago

Perhaps you already know this, but I normally use a hack to set the compiledir using os.pid. It's possible that this could also be handled by pre-compiling the required theano functions and then passing those around.

adrn commented 4 years ago

Oh right (and for context, this was an email I got from a user). Do you have an example you could share?

dfm commented 4 years ago

Something like the following can work:

import os
from multiprocessing import Pool

os.environ["THEANO_FLAGS"] = f"compiledir={os.getpid()}"

import theano
import theano.tensor as tt

def func(x):
    x_ = tt.dscalar()
    return theano.function([x_], [x_ * x_])(x)

if __name__ == "__main__":
    with Pool(4) as pool:
        print(list(pool.map(func, range(10))))
dfm commented 4 years ago

Or...

from multiprocessing import Pool
import theano
import theano.tensor as tt

if __name__ == "__main__":
    x_ = tt.dscalar()
    func = theano.function([x_], [x_ * x_])
    with Pool(4) as pool:
        print(list(pool.map(func, range(10))))
AstroSong commented 4 years ago

Thanks @adrn @dfm

When I add this "os.environ" line to my script, the warning stops keep brushing the screen, and just appears for fixed times (equal to how many processes set in the Pool). However, the code appears to be at a standstill, although the CPU is running. I wait for more than 20 minutes, neither of the processes completes the rejection sampling part. It seems needs quite long time to move to the next step. Still not in parallel?

Another strange case is when I set processes equal to 2, the code can run, but it skips the mcmc part.

My computer has 10 cores, is it OK if I set processes equal to 4?

I also try to open two or three terminals to run a single-process code. It works, but do not save too much time. It seems that the different terminals are not totally in parallel.

adrn commented 3 years ago

@astrosong Strange! Could you share a minimum working example script, and send the versions of schwimmbad & thejoker that you are using? What platform are you on? Thanks!

python -c "import schwimmbad; print(schwimmbad.__version__)"
python -c "import thejoker; print(thejoker.__version__)"
AstroSong commented 3 years ago

@AstroSong Strange! Could you share a minimum working example script, and send the versions of schwimmbad & thejoker that you are using? What platform are you on? Thanks!

python -c "import schwimmbad; print(schwimmbad.__version__)"
python -c "import thejoker; print(thejoker.__version__)"

@adrn If I use the above second code from @dfm, it works without any warning. But if I use the first code, the warning is still there like follows,

INFO (theano.gof.compilelock): Waiting for existing lock by unknown process (I am process '37974')
INFO (theano.gof.compilelock): Waiting for existing lock by unknown process (I am process '37975')
INFO (theano.gof.compilelock): To manually release the lock, delete /home/song/k2_4/joker/test/37878/lock_dir
INFO (theano.gof.compilelock): To manually release the lock, delete /home/song/k2_4/joker/test/37878/lock_dir
INFO (theano.gof.compilelock): Waiting for existing lock by unknown process (I am process '37974')
INFO (theano.gof.compilelock): To manually release the lock, delete /home/song/k2_4/joker/test/37878/lock_dir

python -c "import schwimmbad; print(schwimmbad.version)" => 0.3.1 python -c "import thejoker; print(thejoker.version)" => 1.1

In addition, I updated joker before by using the git+https://github.com/adrn/thejoker, but I got one mistake:

  File "/usr/local/python3/lib/python3.8/site-packages/thejoker/prior.py", line 320, in sample
    with random_state_context(random_state):
  File "/usr/local/python3/lib/python3.8/contextlib.py", line 113, in __enter__
    return next(self.gen)
  File "/usr/local/python3/lib/python3.8/site-packages/thejoker/utils.py", line 299, in random_state_context
    np.random.seed(integers(random_state, 2**32-1))  # HACK
  File "/usr/local/python3/lib/python3.8/site-packages/thejoker/utils.py", line 30, in <lambda>
    integers = lambda obj, *args, **kwargs: obj.integers(*args, **kwargs)
AttributeError: 'numpy.random.mtrand.RandomState' object has no attribute 'integers'

My numpy version is 1.19.0.