Probability function returned NaN

maraattia commented 2 years ago

Good morning,

I have an issue on emcee. The EnsembleSampler crashes after a few iterations and raises a ValueError: Probability function returned NaN. The thing is my user-defined probability function cannot return NaNs, as it defined something like this:

def my_probability_function(x):
    prob = logP(*x)
    if prob > 0 or np.isnan(prob): return -np.inf
    return prob

where logP is basically an evaluation of a beforehand-constructed RBF interpolator.

I call the EnsembleSampler like this:

import emcee
import numpy as np
from pathos.pools import ProcessPool

NW = 64 #Number of workers
ND = 7 #Number of dimensions
MCMC_PATH = 'path/to/backend.h5'
backend = emcee.backends.HDFBackend(MCMC_PATH)
backend.reset(NW, ND)
with ProcessPool(nodes=8) as pool:
    pool.restart()
    sampler = emcee.EnsembleSampler(NW, ND, my_probability_function, pool=pool, backend=backend)
    sampler.run_mcmc(p0, 5000, progress=True)

Changing the logP function to e.g. a nearest interpolator solves the problem, so I guess the issue comes from a bad interaction between emcee and scipy's RBF interpolator?

I tried advancing the MCMC iteration by iteration and comparing at each step the manual and emcee-stored evaluation of the probability function:

sampler.run_mcmc(pp, 1, skip_initial_state_check=True)
pp = sampler.get_chain(flat=True)[-NW:]

logP_manual = [my_probability_function(p) for p in pp]
logP_emcee, _ = sampler.compute_log_prob(pp)

diff = np.abs(1. - logP_manual/logP_emcee)

and at some point, diff was not identically zero, meaning that my_probability_function sometimes returns different results depending on if one evaluates it outside or inside emcee!

Can you help me please? Unfortunately, I was not able to replicate the problem in a minimalist example, as it only emerges when the RBF interpolator is constructed on my personal 2M data points in 7 dimensions.

Many thanks in advance.

dfm commented 2 years ago

I'm not too sure what to say here. There isn't any real reason why emcee would cause different behavior with your interpolator - it just directly calls the function you provide: https://github.com/dfm/emcee/blob/afba7a505b391b74d1eb2f27f1b1129c31a29948/src/emcee/ensemble.py#L489

Some thoughts:

There isn't any strong reason why a probability density needs to be less than 1, so you probably don't want the prob > 0 check in your function.
The only processing that emcee does with the results of your function call is to pass it though a Python float, so perhaps there's some numerical instability there, but if there is that suggests that there's probably something else wrong with your function. You might want to debug what's happening there?
Do you find the same behavior if you don't use multiprocessing? Perhaps there's an issue with multiprocessing and your interpolator.

maraattia commented 2 years ago

Thank you very much for your answer. I'll try your suggestions and let you know.

mdecleir commented 4 months ago

Hi,

I was wondering if there are any updates on this issue? I am seeing the same error while running the emcee fitter with the scipy skewed normal function, but not always. It seems to depend on the bounds I use for one of the parameters and the number of steps I use in the fit. Also, it is not easily reproducible, as sometimes the error happens, and then the next run it does not happen, with the exact same input parameters, bounds, steps, etc.

dfm / emcee

Probability function returned NaN #415