Closed bnord closed 6 months ago
Following the playbook here from a different effort to implement this for a particular project
https://github.com/deepskies/OpticalClusterSBI/discussions/3
So this is in progress, and I have it working with mock data, but I'm having an issue moving it to work with the validation data we're using for our test cases. According to this comment: https://github.com/deepskies/OpticalClusterSBI/discussions/3#discussioncomment-8905159 the input parameter P
is the prior, which is not included in our validation set. How should we go about handling this? My knowledge of simulation based inference is sort of lacking to engineer a solution myself.
For the time being, I'll supply a few choices of standard N-D priors as arguments but I think this is more a stopgap than a long term solution.
cc: @bnord @beckynevin
are we working in the context of having {train; valid; test}, or does "validation" = "test" data here? Just want to understand the data split as I think about this.
Oh sorry, throwaway comment. The data being used in the test cases to make sure things work, it's labeled "Validation" but there is no "Test" labeled data in the reference data for testing/implementation verification.
You can use the data.h5data(data_path="resources/saveddata/data_validation.h5")
to see what I'm using.
Can we generate validation data that has a prior associated with it? similar to how we produce training data?
We can, but we should make that a clear requirement of the input data. As I see it, we sort of have two options - Allow that "Here's a few options for priors" (e.g. normal, poisson, pretty much anything numy.random.distrtibution) as a built in, or force the user to bring their own. (Throw an error if they try to load in data that doesn't have a "prior" field). Or both I suppose, as options.
Those 2 options make sense to me. A 3rd option is that if priors aren't defined, there's l-c2st plot and they get warning.
And maybe we should follow suit for all the other diagnostics -- i.e., if people don't give the data needed for them, then they get a warning and they get whatever diagnostics are possible with what they provide?
Okay - I have the plot technically working, but the parameters laid out in optical-clustering https://github.com/deepskies/OpticalClusterSBI/discussions/3 make plots that look like this:
(Please ignore the actual values, the classifier being used in produce these is untrained and thus the confidence region is trash. There is a 2d hist version for two values compared as well.)
Is this what we're actually looking for or do we want more of the corner plots (from the paper below).
we'd like to do corner plots also so that we can view the correlations. Could you keep the corner plot and local pp-plot separate?
Yeah those can be separated easily. Also yeah I figured the corner plots would be far more useful but being a mind reader is not among my talents so I thought I'd ask
Update -
love it.
https://github.com/JuliaLinhart/lc2st
local classifier two-sample tests