Add distance to the BEAST

karllark commented 7 years ago

Adding distance is needed for work in the Magellanic Clouds.

The fastest way to do this from human standpoint is to do multiple BEAST runs with different distances on a uniform grid. The results can be used to regenerate all the standard outputs of the BEAST with the addition of the distance information included and distance as the 7th fit parameter.

Basically (given a uniform distance grid):

For all the 1D pPDFs (except distance), the 1D pPDFs for the the different distances can be added together to generate the 1D pPDFs including marginalization over distance. need to make sure we are not normalizaing the 1D pPDFs for output
For the the 1D pPDF for distance, this can be created by simply summing one of the 1D pPDFs for any other parameter for each distance as this provides the marginalization over all other parameters
For the stats file, all the parameters p50, p13, p87, exp, etc. need to be regenerated from the 1D pPDFS (just like they are done during the fitting)
For the max values, use the set of Pmax values for each distance and find the maximum one to set a the new Pmax value (and min chisqr, etc.)
For the sparse nD likelihoods, I think the solution is to merge all of them into one file, created a sparse likelihood that is n times larger than for a single distance run (more thinking about if this is correct from the sampling standpoint)

Code is needed to do all this semi-seamlessly to avoid human error (e.g., include distance grid in datamodel.py and provide scripts to setup the n distance BEAST grids and merge the results).

karllark commented 7 years ago

One thought: we should put in distance just like the other 6 parameters. Thus, we can make a grid that has a range of distances if we want, or have a grid with one distance. This would provide flexibility in putting in distance. Maybe the grid will get small enough to allow enough distance bin points - or computers will get enough RAM - or the BEAST will get setup for true parallel computations in the sense of different parts of the grid to difference nodes.

drvdputt commented 6 years ago

I think that adding distances to the grid should not increase the amount of RAM used actually. Since the distance just rescales everything, it wouldn't make sense to store separate SEDs for the models at different distances. When the distances are put into the model grid as an extra axis, we would need some trick to refer to the same SED for multiple grid points. I think the better option is to keep the distances on a separate grid, and implement a loop somewhere near the end, when the probabilities are calculated. I will investigate these sections of the code and see how this fits in.

mfouesneau commented 6 years ago

Distance can be first a deterministically optimized parameter for each model on the grid. Then you get a posterior predictive distribution of distances.

optimal distance: log(D) = 1/5 (1/log10(det(Cov))) (obs - model)^T @ Cov^-1 @ (obs - model)