deepskies / DeepDiagnostics

Inference diagnostics for mostly SBI
MIT License
1 stars 0 forks source link

New Plot: local classifier two-sample tests #39

Closed bnord closed 6 months ago

bnord commented 8 months ago

https://github.com/JuliaLinhart/lc2st

local classifier two-sample tests

bnord commented 8 months ago

Following the playbook here from a different effort to implement this for a particular project

https://github.com/deepskies/OpticalClusterSBI/discussions/3

voetberg commented 7 months ago

So this is in progress, and I have it working with mock data, but I'm having an issue moving it to work with the validation data we're using for our test cases. According to this comment: https://github.com/deepskies/OpticalClusterSBI/discussions/3#discussioncomment-8905159 the input parameter P is the prior, which is not included in our validation set. How should we go about handling this? My knowledge of simulation based inference is sort of lacking to engineer a solution myself.

For the time being, I'll supply a few choices of standard N-D priors as arguments but I think this is more a stopgap than a long term solution.

cc: @bnord @beckynevin

bnord commented 6 months ago

are we working in the context of having {train; valid; test}, or does "validation" = "test" data here? Just want to understand the data split as I think about this.

voetberg commented 6 months ago

Oh sorry, throwaway comment. The data being used in the test cases to make sure things work, it's labeled "Validation" but there is no "Test" labeled data in the reference data for testing/implementation verification.

voetberg commented 6 months ago

You can use the data.h5data(data_path="resources/saveddata/data_validation.h5") to see what I'm using.

bnord commented 6 months ago

Can we generate validation data that has a prior associated with it? similar to how we produce training data?

voetberg commented 6 months ago

We can, but we should make that a clear requirement of the input data. As I see it, we sort of have two options - Allow that "Here's a few options for priors" (e.g. normal, poisson, pretty much anything numy.random.distrtibution) as a built in, or force the user to bring their own. (Throw an error if they try to load in data that doesn't have a "prior" field). Or both I suppose, as options.

bnord commented 6 months ago

Those 2 options make sense to me. A 3rd option is that if priors aren't defined, there's l-c2st plot and they get warning.

bnord commented 6 months ago

And maybe we should follow suit for all the other diagnostics -- i.e., if people don't give the data needed for them, then they get a warning and they get whatever diagnostics are possible with what they provide?

voetberg commented 6 months ago

Okay - I have the plot technically working, but the parameters laid out in optical-clustering https://github.com/deepskies/OpticalClusterSBI/discussions/3 make plots that look like this:

image (Please ignore the actual values, the classifier being used in produce these is untrained and thus the confidence region is trash. There is a 2d hist version for two values compared as well.)

Is this what we're actually looking for or do we want more of the corner plots (from the paper below). image

bnord commented 6 months ago

we'd like to do corner plots also so that we can view the correlations. Could you keep the corner plot and local pp-plot separate?

voetberg commented 6 months ago

Yeah those can be separated easily. Also yeah I figured the corner plots would be far more useful but being a mind reader is not among my talents so I thought I'd ask

voetberg commented 6 months ago

Update -

image image

bnord commented 6 months ago

love it.