Data exploration notebook

aimalz commented 7 years ago

This needs its own branch, so it merits its own issue.

aimalz commented 7 years ago

In making this notebook, I've discovered a "bug" in that the data we've been given is not normalized. I think I need to know more about how it was made, but for now I'm implementing a kludge to normalize all parametrizations before plotting them.

drphilmarshall commented 7 years ago

Good plan. All PDF's must integrate to one!

On Tue, Apr 4, 2017 at 12:42 PM, Alex Malz notifications@github.com wrote:

In making this notebook, I've discovered a "bug" in that the data we've been given is not normalized. I think I need to know more about how it was made, but for now I'm implementing a kludge to normalize all parametrizations before plotting them.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/aimalz/qp/issues/64#issuecomment-291559849, or mute the thread https://github.com/notifications/unsubscribe-auth/AArY93rJ_iPDGtU-iPeiWpaN1MyvRos8ks5rsnMFgaJpZM4Mqmsa .

aimalz commented 7 years ago

One thing we never implemented in pdf.py is integration of each parametrization, so we can't normalize them within qp. I don't think this is necessarily a problem, but I'll write a disclaimer into the documentation saying users are responsible for making sure inputs integrate to unity.

aimalz commented 7 years ago

Also, the current holdup with this issue is that the mixture models fit to the samples from the real gridded data break the CDF/PPF functions. I'm investigating where the problem is. . .

drphilmarshall commented 7 years ago

Some parametrizations are already normalized or normalizable, though, right? If you can rejection sample then the PDF must be normalized! :-)

Re: mixture model bug: happy hunting! :-)

aimalz commented 7 years ago

I ran your analysis function for different numbers of parameters in each parametrization and got this cool result. Check it out! (-: money So it looks like with more than about 30 numbers for KLD and 100 for RMSE, it doesn't matter what parametrization you use. But the KLD indicates samples are better than histograms are better than quantiles for fewer than 30 parameters and the RMSE indicates quantiles are better than samples are better than histograms for nontrivial numbers of parameters less than 30. And at the smallest numbers of parameters, both metrics agree samples are better than histograms are better than quantiles.

drphilmarshall commented 7 years ago

Nice! Not sure what to make of the disagreements between KLD and RMSE, other than to note that they motivate our switching to science-based metrics (like ability to predict n(z)). I'd be interested to see how well n(z) can be estimated by concatenating the samples and then histogramming them, rather than stacking the KDEs. Want to have a go at that?

On Mon, Apr 10, 2017 at 11:20 AM, Alex Malz notifications@github.com wrote:

I ran your analysis function for different numbers of parameters in each parametrization and got this cool result. Check it out! (-: money https://cloud.githubusercontent.com/assets/8606810/24876046/2cd76f38-1df8-11e7-8603-3a653e462c66.png So it looks like with more than about 30 numbers for KLD and 100 for RMSE, it doesn't matter what parametrization you use. But the KLD indicates samples are better than histograms are better than quantiles for fewer than 30 parameters and the RMSE indicates quantiles are better than samples are better than histograms for nontrivial numbers of parameters less than 30. And at the smallest numbers of parameters, both metrics agree samples are better than histograms are better than quantiles.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/aimalz/qp/issues/64#issuecomment-293035226, or mute the thread https://github.com/notifications/unsubscribe-auth/AArY90nEsaCWR1d_LR6xkZ-3w8KjTDJbks5runLzgaJpZM4Mqmsa .

aimalz commented 7 years ago

Yep, here's the result: summary The high variability makes me think we're suffering from small number statistics here.
But, when I inspect the n(z) for each parametrization, it's clear that there's a bug: nz_comparison I'm looking into it now. . .

aimalz commented 7 years ago

I think both issues (at N_floats=3 an N_floats=30) are being caused by not enforcing normalization for interpolations. I made #72 and will fix that in a branch off of issue/64/data_exploration_notebook. Stay tuned for a pull request!

drphilmarshall commented 7 years ago

Good. Those look like useful diagnostic plots to remake. Good luck!

On Tue, Apr 11, 2017 at 1:09 PM, Alex Malz notifications@github.com wrote:

I think both issues (at N_floats=3 an N_floats=30) are being caused by not enforcing normalization for interpolations. I made #72 https://github.com/aimalz/qp/issues/72 and will fix that in a branch off of issue/64/data_exploration_notebook. Stay tuned for a pull request!

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/aimalz/qp/issues/64#issuecomment-293385507, or mute the thread https://github.com/notifications/unsubscribe-auth/AArY97JM5M9LzIjoGHvGTf6pbwaKh-WRks5ru937gaJpZM4Mqmsa .

aimalz / qp

Data exploration notebook #64