Closed cscherrer closed 4 years ago
@DilumAluthge This sounds good to me. If there is a devil in the details, it will be in this part:
- Generalize performance metrics to accept
AbstractVector{<:Distributions.Sampleable}
as input, instead of justAbstractVector{<:Distributions.Distributions}
.
One question: Do we actually need this for the single-target classification metrics? Can't the sampling be done on the ppl side to get the UnivariateFinite
distribution objects? This would avoid creating an interface point on the metrics side for how many samples to take, and so forth.
edit Also, it would mean we only sample once, and not every time one wants to approximate the pdf.
Feel free to spawn a new thread for this discussion.
@DilumAluthge This sounds good to me. If there is a devil in the details, it will be in this part:
- Generalize performance metrics to accept
AbstractVector{<:Distributions.Sampleable}
as input, instead of justAbstractVector{<:Distributions.Distributions}
.One question: Do we actually need this for the single-target classification metrics? Can't the sampling be done on the ppl side to get the
UnivariateFinite
distribution objects? This would avoid creating an interface point on the metrics side for how many samples to take, etc.Feel free to spawn a new thread for this discussion.
Yeah this part is tricky. @cscherrer has some thoughts here: https://github.com/cscherrer/SossMLJ.jl/issues/7
Really nice summary, thank you @ablaom !
This was especially helpful:
At present, the only kind of probablistic supervised learning model that MLJ designed to interface is a model that:
(i) Assumes data $(X_1, y_1), (X_2, y_2), ... $ is generated by an i.i.d process; and
(ii) Is capable of delivering, after seeing training data $D$, a probability distribution $p (y | x, D)$, defined for each new single input observation $x$.
Aha! I think I finally understand your use of iid. To this point, I had thought you (and several others) meant for the y values to be iid. But here you're talking about (x,y) pairs being iid, which makes much more sense.
This assumption doesn't hold in a Bayesian context - there (say with parameter theta) we can only say the (x,y) pairs are conditionally independent given theta.
Say we have a function
marginalize(<: Distribution{Vector}) :: Vector{Distribution}
In terms of scoring models, the correlated predictions will be most useful when making a decision based on an aggregate. Most scoring rules (including Brier) take each predicted distribution in turn, so evaluation on a given d <: Distribution{Vector}
will be equivalent to evaluation on marginalize(d)
. The evaluation function "factors through marginalization".
For an example of an "aggregate decision" like this, say we want a simple Markov chain model to predict the price of a given stock tomorrow. x
might include a "ticker symbol" ID and yesterday's price, and theta is some latent "market conditions". y
is today's price for that same stock. Our model might make the assumption that a given y
depends only on the corresponding x
and the latent theta
.
We could instead make this "wide data". But there are some reasons we might not want to do that:
x
s for prediction x
values that we want to manageNow, imagine we buy some collection of stocks and want to forecast its net value, which is a univariate distribution. If we only have access to the marginals, we'll drastically underestimate the variance of this.
I would be very surprised if there is any way to construct a consistent "vector of distributions" from the correlated predictions
predict(mach, Xnew)
.
As I understand your "consistency", it's the same as saying the marginals are correct. Is that right? I agree that it's very hard to do efficiently for the general case, so the fallback method might be very slow. But at least for things like Gaussians (BayesianLinearRegression.jl) or for Soss models using MCMC, it should be relatively easy.
Meta-issue to track the JointProbabilistic <: Probabilistic
subtype: https://github.com/alan-turing-institute/MLJ.jl/issues/642
We now have the JointProbabilistic
model type and the predict_joint
generic function, which satisfies the original use case of this issue.
@cscherrer I think this issue can be closed now?
There may be additional opportunities for design discussion and improvements, but I think those can take place in separate issues that have more narrow and specific scopes.
For Probabilistic models, the Quick Start Guide says the returned value of
predict
should be aIn Bayesian modeling, the posterior distribution of the parameters leads to a correlation on the predictions, even if noise on top is this happens to be independent.
Is it currently possible to have correlated responses in
predict
ions? If not, I'd love to see an update to allow this. For example, this would be very easy ifpredict
were allowed to return aDistribution{Vector{t}}
instead of aVector{Distribution{T}}
.For some more context, I'd like to use this for
SossMLJ.jl
and an MLJ interface forBayesianLinearRegression.jl
.EDIT: Oops, the types I have here for
Distribution
are wrong, the type here was supposed to indicate the type of the return value.