Open lea-hagen opened 4 years ago
I've done a test from which I conclude that there needs to be more samples and/or a smaller (less negative) threshold.
Details: I'm currently approaching #373 by adding up the probabilities within the specified range of physical parameters, and dividing that by the sum of all 500 probabilities. For the example below, for each star, I add up all of the probs associated with M_ini>10 and Av>0.5, and divide by the total prob. I then did an experiment to simulate selecting fewer lnP
values, in which I randomly select half of them and do the same procedure. Here's the plot:
There are a significant number of catastrophic failures (i.e., either the most or the least probable points are chosen, drastically changing the result), which means that we need to save more lnPs. Also, there tend to be more points above than below the line, which I haven't figured out yet.
@karllark I was thinking of a couple additional lnP selection strategies. I suspect you all considered them at some point, so I'm curious about why they're not being used.
I think there are some issues here that we are not yet considering, but I don't know what they are in detail. In discussions on this topic it has been @ktchrn and @mfouesneau that have knowledge and advice on this topic. Maybe one of them can even point us to the relevant literature.
At that time I believe we were taking all samples above a threshold.
You can also take the N best points but then you need to also compute and save their weights using nested sampling prescription. (I think this should be easy as we still have the density/volume of each grid point?)
We do still have the option of saving all samples above the threshold (see https://github.com/BEAST-Fitting/beast/blob/afefb850ae55275a25615a4ba3b633bf9bd2c46c/beast/fitting/fit.py#L543). This is still a lot of samples, hence a large amount of disk space needed. Maybe we need to test what is a good enough threshold and then just save everything above that. @lea-hagen with your testing, what about testing the total integrated probability for the whole grid versus the same but only above a threshold? And then change the threshold? We picked ln(max_prob) - 10 somewhat arbitrarily.
@mfouesneau we did go to a sparse saving of the points above the threshold way back in the day. I remember a conversation with you and this is how we picked 50 points. Now we've defaulted to 500, but that does not seem enough. I'm not sure the origin of the 50, but I seem to remember it was something about how many samples were needed for hierarchical Bayesian modeling. Does this jog any memories?
I remember @davidwhogg saying 12 was enough...
I do not really remember why we stopped at 50, I think it was for disk space initially until we'd have proper use of these.
I ran another test in which I saved all of the lnPs (using a coarse grid so it would be quicker). I tested the scenario in which we sort the probabilities and then save up to X% of the cumulative probability. This worked fine for most stars, but some had most of their probability in one lnP value, so none of their lnPs were retained (because no points were, e.g., less than 80% of the total). To account for that, I changed the criteria slightly to the larger of [X% of cumulative probability] and [save the 10 largest lnPs].
Here's a plot comparing the probability of a star having M>10, Av>0.5 between saving ALL lnPs and saving a subset (sampled at X = 20, 40, 60, 80, and 98% of the total probability). The points have some artificial horizontal spread to make the distributions easier to see. The bar at 0 is stars for which NONE of the lnPs are consistent with this M/Av cut, so they'll match perfectly no matter what fraction of the lnPs we save. Once we get to 80% of the cumulative probability, basically all of the stars are within 10 percentage points of the "real" value. So at least for this (would have to test for some different parameter cuts), 80% seems like a reasonable place for saving lnPs.
For reference, the plot on the right shows the cumulative probability as a function of number of saved lnPs.
Probability of M>10, Av>0.5 for different cuts in cumulative probability | Cumulative probability by index |
---|---|
We may want to save two samplings. One for use in the MegaBEAST and one for use in generating probabilities of different types of stars/sightlines. I'm not sure that the two use cases can be served by the same sampling of the nD posterior.
I don't remember where the number of sampling 50 came from. But I do remember why we picked the threshold of -10. We wanted to sample probabilities within 5 sigma. exp(-10) corresponds to something between 4 sigma to 5 sigma. My tests with NGC 4214 data back in the day showed that the resulting fits with the threshold of -5 (corresponding to ~3 sigma cut) were not very different from those with threshold of -10. However, I think we might want to keep the threshold of -10 when dealing with the data deeper than NGC 4214 to make sure we sample probabilities from the entire parameter space. This is because many possible combinations of dust models can reasonably reproduce the width of RGB.
We may want to save two samplings. One for use in the MegaBEAST and one for use in generating probabilities of different types of stars/sightlines. I'm not sure that the two use cases can be served by the same sampling of the nD posterior.
Have we determined that the megabeast can't work with the lnPs representing X% of the cumulative probability? (I'm nowhere near an expert on hierarchical Bayesian modeling, so hopefully someone else knows for sure.) The thing I worry about is the same thing that plagues random sampling for star types - what if the saved lnPs totally miss the highest probability regions? Is that why the sampling used weights before (#121)?
As I'm working on #373, I've been puzzling over the question about the proper parameters for sampling the log likelihoods. Right now, we randomly select 500 from those that have have
lnP > max(lnP) - 10
. A while back, this selection had been weighted by the probabilities themselves to ensure that we don't miss the highest probability regions, but we decided to switch to random to keep from double-counting the probabilities (issue in #121, PR in #200).The issue I'm coming across now is whether 500 samples with a -10 threshold is enough to well-sample the probability space. My first question is how these numbers were chosen, and my second question is how to test that they're reasonable.