This update includes major enhancements to dingo_pipe:
Allow for changes to data conditioning, PSD, and prior during importance sampling. This is important to be able to alter such settings from those chosen at train time.
Enable splitting of importance sampling task across multiple jobs, which will be useful for accelerating importance sampling on a cluster.
Plotting including corner plots, weights plots, and target versus proposal probability plots. These are convenience plots intended for quick looks at results.
Usage
Data conditioning and PSD
Changes can be specified in two ways:
As general settings in the .ini file. E.g., by specifying duration different from that of the trained network, dingo_pipe will first analyze an event with data with the network duration, then generate new data with the new duration and use that for importance sampling.
Explicitly in the importance-sampling-updates dictionary in the .ini file. Again these changes are applied during importance sampling, possibly generating a new EventDataset. The reason to allow explicit changes is so that one can, e.g., specify one PSD for initial sampling, and a different one for importance sampling.
Prior
Prior updates are specified by passing a prior-dict in the .ini file. This need only specify changes to the network prior, not the full prior. E.g., to change to a cosmological distance prior, one could say
Based on this, dingo_pipe would calculate new weights and a new log evidence during importance sampling.
Note: Initial Dingo results still use the network prior. It is possible to use an updated prior here too, but so far it has not been set up.
Parallelization
To split importance sampling across multiple jobs, set the value of the n-parallel setting to be greater than 1. Each of these jobs will furthermore parallelize importance sampling across request-cpus-importance-sampling cores.
To effect these changes, we store additional metadata in the EventDataset. There is a new method Result.reset_event() which allows to replace the existing EventDataset in the Result. When doing so, this method compares the event metadata, applying necessary changes to the domain. Note that these changes may cause older saved EventDataset files to become incompatible.
In dingo_pipe we introduce a second data generation node, which creates new data for importance sampling in the case of updates.
There is a new method Result.update_prior() which takes a dict of priors and calculates a new log_prior for each sample. It saves the changes to the prior in importance_sampling_metadata(). The prior updater in turn calls Result._calculate_evidence() to calculate the weights and log evidence. The latter method was split out of Result.importance_sample().
For parallelization of importance sampling we introduce Result.split() and Result.merge() methods. The former is called at the end of the dingo_pipe sampling to split the result over n-parallel parts. Each of these is then importance sampled, before merging back with the Merge node (which inherits from Bilby). The Merge node in turn calls a new command dingo_result --merge.
Plotting methods have been added to core.Result. These are called by dingo_pipe_plot, which is in turn called by the PlotNode. The latter inherits from the corresponding Bilby class.
Finally, the example GW150914.ini file has been updated to reflect some of the new functionality.
To-do
Low-latency results: Add an option to not use importance sampling.
Test using Condor. So far only local = true option is used. In particular, requests for particular GPUs have not been implemented. Also consider OSG, slurm.
Summary
This update includes major enhancements to
dingo_pipe
:Usage
Data conditioning and PSD
Changes can be specified in two ways:
.ini
file. E.g., by specifyingduration
different from that of the trained network,dingo_pipe
will first analyze an event with data with the network duration, then generate new data with the new duration and use that for importance sampling.importance-sampling-updates
dictionary in the.ini
file. Again these changes are applied during importance sampling, possibly generating a newEventDataset
. The reason to allow explicit changes is so that one can, e.g., specify one PSD for initial sampling, and a different one for importance sampling.Prior
Prior updates are specified by passing a
prior-dict
in the.ini
file. This need only specify changes to the network prior, not the full prior. E.g., to change to a cosmological distance prior, one could sayBased on this,
dingo_pipe
would calculate new weights and a new log evidence during importance sampling.Note: Initial Dingo results still use the network prior. It is possible to use an updated prior here too, but so far it has not been set up.
Parallelization
To split importance sampling across multiple jobs, set the value of the
n-parallel
setting to be greater than 1. Each of these jobs will furthermore parallelize importance sampling acrossrequest-cpus-importance-sampling
cores.Plotting
To turn on all plotting set
Code updates
To effect these changes, we store additional metadata in the
EventDataset
. There is a new methodResult.reset_event()
which allows to replace the existingEventDataset
in theResult
. When doing so, this method compares the event metadata, applying necessary changes to the domain. Note that these changes may cause older savedEventDataset
files to become incompatible.In
dingo_pipe
we introduce a second data generation node, which creates new data for importance sampling in the case of updates.There is a new method
Result.update_prior()
which takes a dict of priors and calculates a newlog_prior
for each sample. It saves the changes to the prior inimportance_sampling_metadata()
. The prior updater in turn callsResult._calculate_evidence()
to calculate the weights and log evidence. The latter method was split out ofResult.importance_sample()
.For parallelization of importance sampling we introduce
Result.split()
andResult.merge()
methods. The former is called at the end of thedingo_pipe
sampling to split the result overn-parallel
parts. Each of these is then importance sampled, before merging back with theMerge
node (which inherits from Bilby). TheMerge
node in turn calls a new commanddingo_result --merge
.Plotting methods have been added to
core.Result
. These are called bydingo_pipe_plot
, which is in turn called by thePlotNode
. The latter inherits from the corresponding Bilby class.Finally, the example
GW150914.ini
file has been updated to reflect some of the new functionality.To-do
local = true
option is used. In particular, requests for particular GPUs have not been implemented. Also consider OSG, slurm.