PolyChord / PolyChordLite

Public version of PolyChord: See polychord.co.uk for PolyChordPro
https://polychord.io/
Other
84 stars 26 forks source link

First point in the sample has weight 1 #63

Closed JesusTorrado closed 3 years ago

JesusTorrado commented 3 years ago

Hi @williamjameshandley

This is something that I've found happening occasionally, with completely different kinds of likelihoods, and have not been able to reliably reproduce. So hopefully just by looking at it you may already have some idea why it happens.

In these cases, the first line (and only the first one) of the PolyChord .txt sample file looks something like

0.100000000000000E+001 -0.000000000000000E+000  0.000000000000000E+000  0.202369782602220E-317  0.464787078543982E-309  0.690709535740640E-309  0.464787078543982E-309  0.464786243447034E-309

The first 4 columns after the weight and -logpost are (or should be) 4 sampled parameters, and the other two derived ones. In this cases the evidence is severely overestimated (checked with an analytical computation in a gaussian clase, but pretty obvious, because a huge number is obtained). Any idea? Maybe related to #40? (but using gfortran here)

Also, if I still wanted to keep one of these runs' chains and compute the evidence from them manually, how would I do that? I've tried

t = np.loadtxt([chain_name].txt)
weights, logposts = t[:, 0], -t[:, 1]/2
logZ = np.log(np.sum(weights * np.exp(logposts))/np.sum(weights))

but the result is consistently overestimated by a few log-units (even when correcting for discarded prior samples, if any). scipy.special.logsumexp does not help.

Thanks!

williamjameshandley commented 3 years ago

Hi @JesusTorrado. Given the values of the final four columns, the first of these looks like a memory error. Are you sure that you have things like nDims and nDerived set correctly? Does compiling polychord with debug flags show anything up?

With regards to computing the evidence after the fact, the above code won't work since in the chains files the weights are normalised to have maximum weight 1. MultiNest has a different posterior normalisation (sum to 1), but neither would give you the evidence.

For computing the evidence from the chains files, I would recommend (pip-installable) anesthetic, which is designed to do precisely this

from anesthetic import NestedSamples
samples = NestedSamples(root=<file root>)
logZ = samples.logZ()
logZ_samples = samples.logZ(1000)

anesthetic has a fair amount of functionality in addition to this, but basically it extends pandas to be able to do weighted calculations (anesthetic.weighted_pandas.WeightedDataFrame), and then extends this into anesthetic.samples.MCMCSamples and anesthetic.samples.NestedSamples, adding plotting utilities and then nested sampling utilities.

JesusTorrado commented 3 years ago

Thanks, @williamjameshandley!

The rest of the sample is fine (removing that 1st row and checking with GetDist), so nDims and nDerived should be correctly set. It is a memory error almost for sure. I'll keep on trying to find a MWE to reproduce it (likelihood and seed), check with debug flags, and re-open this issue.

We will try with anesthetic as a workaround, thanks! I always wanted to try it and never had time, so this is a good chance.

williamjameshandley commented 3 years ago

Hi @JesusTorrado, @yallup was able to find a MWE which reliably reproduced this issue in #72, and this bug is now squashed in #73 !