Posterior and Prior: potential variable/function naming issues.

bnord commented 7 months ago

I'd like to do some "prior" predictive checks (#56).

I notice that this will be possible by reading in prior data with the h5 reader.

I also notice that sbi_model.py has something called the posterior, which is actually the trained model.

When we talk about priors and posteriors, they can both be data, but I think only the posterior can also be called a model.

Should we be concerned about this overloading or ambiguity of the term, "posterior?"

voetberg commented 7 months ago

I agree, there is a clear term overload. Posterior as a model name was ported directly from the eval script, which I think was ported from old code. So it's an easy swap to make in the sbi_model.py for clarity.

bnord commented 7 months ago

i think my primary/only concern is that macke lab 'sbi' leans into this kind of usage of 'posterior'.

Should we conform to their jargon or use our own and explain the difference?

voetberg commented 7 months ago

I really do not like this jargon if I'm being honest, it caused me a lot of a headache when working on the TARP pr. I would much prefer something like:

Prior - initial guess at the posterior to be used with the model.
Proposal - transient possible posterior, meant to either be promoted to the posterior or updated.
Posterior - probability distribution of a trained model, containing theta.
Theta - generic "parameters of the model"
Model - Predictive engine that returns a guess at theta. If it has an "inference" method, it's a model.

We should also outline what "x" and "y" means explicitly, because I think there is conflict with how SBI uses them. I think defining y = simulator(theta, x) makes sense to my brain but I have less experience with (generic) simulation based inference.

bnord commented 7 months ago

I completely agree that the overloaded jargon is frustrating and makes code-writing and reading time-consuming. How about we give it a go in our own way and write some text that translates it for our users.

bnord commented 7 months ago

What potential conflict did you find?

We should also outline what "x" and "y" means explicitly, because I think there is conflict with how SBI uses them.

voetberg commented 7 months ago

Specifically the way inference/sampling from the model. I used this line for reference - https://github.com/deepskies/DeepDiagnostics/blob/1087f266355a15d3e5a7665b70276d972244dbc9/src/scripts/io.py#L34 and requiring x=y for any line is. Icky. Though, this may be a result of a misunderstanding on my end.

What potential conflict did you find?

We should also outline what "x" and "y" means explicitly, because I think there is conflict with how SBI uses them.

bnord commented 7 months ago

Is this the appropriate definition for posterior.sample? https://github.com/sbi-dev/sbi/blob/main/sbi/inference/posteriors/ensemble_posterior.py#L146

I want to make sure I understand how they're defining x's and y's.

voetberg commented 7 months ago

Yep, that's the one

bnord commented 7 months ago

So x is a conditioning context ...

   x: Conditioning context. If none is provided and no default context is set,
            an error will be raised.

and one is allowed to use y_true, which I take to mean the true value of a label for a given input object. I think that makes sense to me.

bnord commented 7 months ago

Is the conflict you identified in the names of the variables or is it in the theoretical usage of those variables?

voetberg commented 7 months ago

Just the naming - "x" is just a terrible name for this sort of thing. Hearing your perspective on what that is supposed to represent, and that it doesn't conflict with what we're doing is good to know though.

bnord commented 7 months ago

got it. I vote that all of our variables are as expressive as possible, even pedantically so.

voetberg commented 6 months ago

Is this covered by #62?

bnord commented 6 months ago

I think so.

deepskies / DeepDiagnostics

Posterior and Prior: potential variable/function naming issues. #57