Closed max-dax closed 1 year ago
I've been looking into this as we discussed, still working on the IS part but one thing I've noticed is that the delta_f is 0.00390625 for the pBilby samples whereas we usually use 0.125. This seems quite different, perhaps it is referring to some other delta_f. I found this delta_f in the official p_bilby samples (https://zenodo.org/record/6513631/files/IGWN-GWTC2p1-v2-GW150914_095045_PEDataRelease_mixed_cosmo.h5?download=1) but running the following code:
import pandas as pd
import bilby
import h5py
import numpy as np
import os
import re
# Loading IGWN bilby samples
def h5_to_dict(h5_object, dictionary):
"""Recursive function to convert an h5py object to a dictionary"""
for key in h5_object.keys():
item = h5_object[key]
if isinstance(item, h5py._hl.dataset.Dataset):
dictionary[key] = item[()]
elif isinstance(item, h5py._hl.group.Group):
dictionary[key] = {}
h5_to_dict(item, dictionary[key])
return dictionary
# Settings a base_dir for this notebook
base_dir = "/data/nihargupte/projects/hotfixes/dingo-devel/devel_testing/bilby_offset"
# Open the .h5 file
file = os.path.join(base_dir, "IGWN-GWTC2p1-v2-GW150914_095045_PEDataRelease_mixed_cosmo.h5")
f = h5py.File(file, 'r')
# Convert the .h5 file to a dictionary
bilby_result = h5_to_dict(f, {})
# calibration_envelope =
imr_bilby_result = bilby_result["C01:IMRPhenomXPHM"]
# Close the file
f.close()
keys = imr_bilby_result["posterior_samples"].dtype.names
arr = np.stack([np.array([imr_bilby_result["posterior_samples"][i][j] for j in range(len(keys))]) for i in range(imr_bilby_result["posterior_samples"].shape[0])])
imr_bilby_result["posterior_samples"] = pd.DataFrame(columns=keys, data=arr)
bilby_result['C01:IMRPhenomXPHM']["posterior_samples"] = pd.DataFrame(columns=keys, data=arr)
print(float(bilby_result['C01:IMRPhenomXPHM']["meta_data"]["meta_data"]["delta_f"]))
I will report more after doing the IS with calibration marginalization, for now, I am just making sure all the settings in the pBilby samples are properly converted into a DingoDataset.
Curious. It's not an arbitrary number:1/0.00390625 = 256s
, but a segment length of 256s would be completely overkill for a short <1 s BBH like GW150914, so chances are it is referring to something else.
It looks like this file was created by pesummary
. I don't know precisely how the meta_data
are generated, but I agree that delta_f = 0.00390625
does not make sense. In fact, elsewhere in the file it specifies duration = 4.0
. Vivien is one of the main authors of pesummary
so maybe it is worth asking him where these meta-data come from.
Note also that the file can be loaded using
from pesummary.io import read
data = read("IGWN-GWTC2p1-v2-GW150914_095045_PEDataRelease_mixed_cosmo.h5")
AH thank you, I've been putting everything in a Dict till now. Should've figured it was PESummary.
Been working on this a bit more now that I am in the office...
Interesting notes:
The GPS time and duration are 1126259460.3909912 and 4s as opposed to the GWOSC 1126259462.4 and 8s that we use for inference. I don't see any buffer time in the data. bilby_result["C01:IMRPhenomXPHM"]["meta_data"]["meta_data"]["sampling_frequency"] = 2048.0
I am trying to importance sample with the GPS time given, does anyone know if GWOSC is down at the moment?
TimeSeries.fetch_open_data(
"H1",
1126259460,
1126259464
)
Hangs for hours. Once this is done I can run with calibration marginalization.
@max-dax Which geocent_time
did you use in the Dingo analysis of this event for the review?
A quick update on this, I have run importance sampling with calibration marginalization and now am in the process of comparing the likelihoods for the official samples. For Dingo I am getting likelihoods in the range (-16947, -16020) whereas Bilby has likelihoods in the range of (305, 328). My initial thought was that this is because the Bilby likelihoods are unnormalized but after normalizing (in code):
from scipy.special import logsumexp
# Loading data ignore if you are using PESummary instead
# Loading IGWN bilby samples
def h5_to_dict(h5_object, dictionary):
"""Recursive function to convert an h5py object to a dictionary"""
for key in h5_object.keys():
item = h5_object[key]
if isinstance(item, h5py._hl.dataset.Dataset):
dictionary[key] = item[()]
elif isinstance(item, h5py._hl.group.Group):
dictionary[key] = {}
h5_to_dict(item, dictionary[key])
return dictionary
# Open the .h5 file
fname = "IGWN-GWTC2p1-v2-GW150914_095045_PEDataRelease_mixed_cosmo.h5"
f = h5py.File(fname, 'r')
# Convert the .h5 file to a dictionary
bilby_result = h5_to_dict(f, {})
# Close the file
f.close()
bilby_result['C01:IMRPhenomXPHM']["posterior_samples"] = pd.DataFrame.from_records(data=bilby_result['C01:IMRPhenomXPHM']["posterior_samples"])
bilby_log_likelihood = bilby_result['C01:IMRPhenomXPHM']["posterior_samples"]["log_likelihood"]
normalized_bilby_log_likelihoods = bilby_log_likelihood - logsumexp(bilby_log_likelihood)
I am getting log_likelihoods in the range (-30, -7), so still very different. Probably I am missing the convention on how the Bilby samples are stored. @max-dax when you did the comparison without calibration marginalization did you have this normalization difference as well?
Those Bilby likelihoods sound like they are not including the log noise evidence (see, e.g., https://github.com/dingo-gw/dingo-devel/blob/main/dingo/gw/likelihood.py#L254). For the saved Bilby result, this is
In [17]: f['C01:IMRPhenomXPHM']['meta_data']['sampler']['ln_noise_evidence'][0]
Out[17]: -7288.12
Adding this in gets us halfway to the ~ -16k log likelihoods you reported. This is probably because the Bilby analysis used duration = 4.0
and your analysis used 8.0.
Looking into this more after our previous discussions. Even after correcting for the correct duration and using the correct f_max it seems like the log-likelihoods were still off by ~300. For the following examples I am just taking one injection point.
(dingo likelihood, bilby likelihood) were (-7326.554948013703, -6965.81572904715).
As a check, we decided to look into the log noise evidence computed by Bilby but inputting the same data we use for the dingo. It looks like the dingo and bilby log-likelihoods are indeed different.
(dingo noise log-likelihood, bilby noise log-likelihood) is (-7027.95, -6083.2828).
Stepping into this with a debugger a bit I found that this difference arises due to the fact that the noise-weighted inner products differ by a window factor. If I then specify the window factor to Bilby I get the same noise log-likelihood!!
(dingo noise log-likelihood, bilby noise log-likelihood) is (-7027.95, -7027.95).
Ok so if the window factors are the same we have the same noise log likelihood.
$$ \log L = \log Z_n + \kappa2 - \frac{1}{2} \rho{opt}^2 $$
From dingo for a particular set of injection parameters we find $\kappa2 = 12.4$ and $\frac{-1}{2} \rho{opt}^2 = -316.9$. I am now cross-checking the other terms in the log-likelihood. That being said, when I now find the log likelihood with bilby I get -7338.56 which is different than what's in the file (-6965) and also different from Dingo (-7327). But at least we can compare to dingo likelihoods.
As a review we are finding a difference in likelihood of ~12.
Going through with a debugger more, I find that Dingo also agrees with the value for $rho_{opt}^2$. This means both Dingo and Bilby are getting the same data correctly since we have confirmed $\langle d, d \rangle$ and $\langle \mu, \mu \rangle$ are the same. However, it disagrees with $\kappa^2$.
(dingo $\kappa^2$, bilby $\kappa^2$) were (12.001, 0.326)
Looking into this more this is caused by the following difference in the codes. In line 395 of https://github.com/lscsoft/bilby/blob/master/bilby/gw/likelihood.py bilby is summing $\langle d, h \rangle$ across the interferometers. In the code we have something like $(-7.6+2.0j) + (7.9+4.8j)$. So when later taking the real part Bilby gets 0.32.
However, Dingo when executing this sum gets -28.0 + 40.02.
This will take a bit to untangle but working on it...
Good job. Note that $\langle d, d \rangle$ and $\langle \mu, \mu \rangle$ are insensitive to phase/time shifts. This will only show up in $\kappa^2 = \langle d, \mu \rangle$. It is thus possible that we have an incorrect time/phase shift in either the data or the computed waveform.
Fixed with https://github.com/dingo-gw/dingo/pull/180. Finally!
We observe small deviations between Dingo and published bilby results. This happens even when the Dingo-IS sample efficiency is large. One possible explanation is that we use a different likelihood (e.g., missing a window factor somewhere). We should therefore check that the Dingo likelihood agrees with bilby likelihoods.
Ways to track this down:
bilby_pipe
for an event for which we have dingo samples. Step through the likelihood calculation with the pycharm debugger (or a similar tool). Check that (a) the data is identical to what Dingo uses (e.g., make sure that the PSD is not off by a window factor, the conventions are not always consistent) and (b) that the computed waveforms are identical to Dingo.There is also this script https://github.com/dingo-gw/dingo-devel/blob/main/misc_scripts/run_bilby.py to run Dingo from a yaml file.