dingo-gw / dingo

Dingo: Deep inference for gravitational-wave observations
MIT License
55 stars 18 forks source link

Cross check Dingo and bilby likelihoods #132

Closed max-dax closed 1 year ago

max-dax commented 1 year ago

We observe small deviations between Dingo and published bilby results. This happens even when the Dingo-IS sample efficiency is large. One possible explanation is that we use a different likelihood (e.g., missing a window factor somewhere). We should therefore check that the Dingo likelihood agrees with bilby likelihoods.

Ways to track this down:

There is also this script https://github.com/dingo-gw/dingo-devel/blob/main/misc_scripts/run_bilby.py to run Dingo from a yaml file.

nihargupte-ph commented 1 year ago

I've been looking into this as we discussed, still working on the IS part but one thing I've noticed is that the delta_f is 0.00390625 for the pBilby samples whereas we usually use 0.125. This seems quite different, perhaps it is referring to some other delta_f. I found this delta_f in the official p_bilby samples (https://zenodo.org/record/6513631/files/IGWN-GWTC2p1-v2-GW150914_095045_PEDataRelease_mixed_cosmo.h5?download=1) but running the following code:

import pandas as pd 
import bilby
import h5py
import numpy as np
import os
import re

# Loading IGWN bilby samples
def h5_to_dict(h5_object, dictionary):
    """Recursive function to convert an h5py object to a dictionary"""
    for key in h5_object.keys():
        item = h5_object[key]
        if isinstance(item, h5py._hl.dataset.Dataset):
            dictionary[key] = item[()]
        elif isinstance(item, h5py._hl.group.Group):
            dictionary[key] = {}
            h5_to_dict(item, dictionary[key])
    return dictionary

# Settings a base_dir for this notebook
base_dir = "/data/nihargupte/projects/hotfixes/dingo-devel/devel_testing/bilby_offset"

# Open the .h5 file
file = os.path.join(base_dir, "IGWN-GWTC2p1-v2-GW150914_095045_PEDataRelease_mixed_cosmo.h5")
f = h5py.File(file, 'r')

# Convert the .h5 file to a dictionary
bilby_result = h5_to_dict(f, {})
# calibration_envelope = 
imr_bilby_result = bilby_result["C01:IMRPhenomXPHM"]

# Close the file
f.close()

keys = imr_bilby_result["posterior_samples"].dtype.names
arr = np.stack([np.array([imr_bilby_result["posterior_samples"][i][j] for j in range(len(keys))]) for i in range(imr_bilby_result["posterior_samples"].shape[0])])
imr_bilby_result["posterior_samples"] = pd.DataFrame(columns=keys, data=arr)
bilby_result['C01:IMRPhenomXPHM']["posterior_samples"] = pd.DataFrame(columns=keys, data=arr)

print(float(bilby_result['C01:IMRPhenomXPHM']["meta_data"]["meta_data"]["delta_f"]))

I will report more after doing the IS with calibration marginalization, for now, I am just making sure all the settings in the pBilby samples are properly converted into a DingoDataset.

mpuerrer commented 1 year ago

Curious. It's not an arbitrary number:1/0.00390625 = 256s, but a segment length of 256s would be completely overkill for a short <1 s BBH like GW150914, so chances are it is referring to something else.

stephengreen commented 1 year ago

It looks like this file was created by pesummary. I don't know precisely how the meta_data are generated, but I agree that delta_f = 0.00390625 does not make sense. In fact, elsewhere in the file it specifies duration = 4.0. Vivien is one of the main authors of pesummary so maybe it is worth asking him where these meta-data come from.

stephengreen commented 1 year ago

Note also that the file can be loaded using

from pesummary.io import read
data = read("IGWN-GWTC2p1-v2-GW150914_095045_PEDataRelease_mixed_cosmo.h5")
nihargupte-ph commented 1 year ago

AH thank you, I've been putting everything in a Dict till now. Should've figured it was PESummary.

nihargupte-ph commented 1 year ago

Been working on this a bit more now that I am in the office...

Interesting notes:

The GPS time and duration are 1126259460.3909912 and 4s as opposed to the GWOSC 1126259462.4 and 8s that we use for inference. I don't see any buffer time in the data. bilby_result["C01:IMRPhenomXPHM"]["meta_data"]["meta_data"]["sampling_frequency"] = 2048.0

I am trying to importance sample with the GPS time given, does anyone know if GWOSC is down at the moment?

TimeSeries.fetch_open_data(
   "H1",
   1126259460,
   1126259464
)

Hangs for hours. Once this is done I can run with calibration marginalization.

stephengreen commented 1 year ago

@max-dax Which geocent_time did you use in the Dingo analysis of this event for the review?

nihargupte-ph commented 1 year ago

A quick update on this, I have run importance sampling with calibration marginalization and now am in the process of comparing the likelihoods for the official samples. For Dingo I am getting likelihoods in the range (-16947, -16020) whereas Bilby has likelihoods in the range of (305, 328). My initial thought was that this is because the Bilby likelihoods are unnormalized but after normalizing (in code):

from scipy.special import logsumexp

# Loading data ignore if you are using PESummary instead
# Loading IGWN bilby samples
def h5_to_dict(h5_object, dictionary):
    """Recursive function to convert an h5py object to a dictionary"""
    for key in h5_object.keys():
        item = h5_object[key]
        if isinstance(item, h5py._hl.dataset.Dataset):
            dictionary[key] = item[()]
        elif isinstance(item, h5py._hl.group.Group):
            dictionary[key] = {}
            h5_to_dict(item, dictionary[key])
    return dictionary

# Open the .h5 file
fname = "IGWN-GWTC2p1-v2-GW150914_095045_PEDataRelease_mixed_cosmo.h5"
f = h5py.File(fname, 'r')

# Convert the .h5 file to a dictionary
bilby_result = h5_to_dict(f, {})

# Close the file
f.close()

bilby_result['C01:IMRPhenomXPHM']["posterior_samples"] = pd.DataFrame.from_records(data=bilby_result['C01:IMRPhenomXPHM']["posterior_samples"])

bilby_log_likelihood = bilby_result['C01:IMRPhenomXPHM']["posterior_samples"]["log_likelihood"]
normalized_bilby_log_likelihoods = bilby_log_likelihood - logsumexp(bilby_log_likelihood)

I am getting log_likelihoods in the range (-30, -7), so still very different. Probably I am missing the convention on how the Bilby samples are stored. @max-dax when you did the comparison without calibration marginalization did you have this normalization difference as well?

stephengreen commented 1 year ago

Those Bilby likelihoods sound like they are not including the log noise evidence (see, e.g., https://github.com/dingo-gw/dingo-devel/blob/main/dingo/gw/likelihood.py#L254). For the saved Bilby result, this is

In [17]:  f['C01:IMRPhenomXPHM']['meta_data']['sampler']['ln_noise_evidence'][0]
Out[17]: -7288.12

Adding this in gets us halfway to the ~ -16k log likelihoods you reported. This is probably because the Bilby analysis used duration = 4.0 and your analysis used 8.0.

nihargupte-ph commented 1 year ago

Looking into this more after our previous discussions. Even after correcting for the correct duration and using the correct f_max it seems like the log-likelihoods were still off by ~300. For the following examples I am just taking one injection point.

(dingo likelihood, bilby likelihood) were (-7326.554948013703, -6965.81572904715).

As a check, we decided to look into the log noise evidence computed by Bilby but inputting the same data we use for the dingo. It looks like the dingo and bilby log-likelihoods are indeed different.

(dingo noise log-likelihood, bilby noise log-likelihood) is (-7027.95, -6083.2828).

Stepping into this with a debugger a bit I found that this difference arises due to the fact that the noise-weighted inner products differ by a window factor. If I then specify the window factor to Bilby I get the same noise log-likelihood!!

(dingo noise log-likelihood, bilby noise log-likelihood) is (-7027.95, -7027.95).

Ok so if the window factors are the same we have the same noise log likelihood.

$$ \log L = \log Z_n + \kappa2 - \frac{1}{2} \rho{opt}^2 $$

From dingo for a particular set of injection parameters we find $\kappa2 = 12.4$ and $\frac{-1}{2} \rho{opt}^2 = -316.9$. I am now cross-checking the other terms in the log-likelihood. That being said, when I now find the log likelihood with bilby I get -7338.56 which is different than what's in the file (-6965) and also different from Dingo (-7327). But at least we can compare to dingo likelihoods.

nihargupte-ph commented 1 year ago

As a review we are finding a difference in likelihood of ~12.

Going through with a debugger more, I find that Dingo also agrees with the value for $rho_{opt}^2$. This means both Dingo and Bilby are getting the same data correctly since we have confirmed $\langle d, d \rangle$ and $\langle \mu, \mu \rangle$ are the same. However, it disagrees with $\kappa^2$.

(dingo $\kappa^2$, bilby $\kappa^2$) were (12.001, 0.326)

Looking into this more this is caused by the following difference in the codes. In line 395 of https://github.com/lscsoft/bilby/blob/master/bilby/gw/likelihood.py bilby is summing $\langle d, h \rangle$ across the interferometers. In the code we have something like $(-7.6+2.0j) + (7.9+4.8j)$. So when later taking the real part Bilby gets 0.32.

However, Dingo when executing this sum gets -28.0 + 40.02.

This will take a bit to untangle but working on it...

max-dax commented 1 year ago

Good job. Note that $\langle d, d \rangle$ and $\langle \mu, \mu \rangle$ are insensitive to phase/time shifts. This will only show up in $\kappa^2 = \langle d, \mu \rangle$. It is thus possible that we have an incorrect time/phase shift in either the data or the computed waveform.

nihargupte-ph commented 1 year ago

Fixed with https://github.com/dingo-gw/dingo/pull/180. Finally!