Closed DilumAluthge closed 4 years ago
@cscherrer I think you actually wrote up some of this code already, but I'm not sure where it is.
The nice thing about Iris is that we know that the model should be pretty good. So it can serve as a sanity check to make sure everything is working.
How about this:
m = @model X,num_levels begin
k = size(X,2)
β ~ Normal() |> iid(k,num_levels)
p = softmax.(eachrow(X*β))
y ~ For(p) do pj
Categorical(pj)
end
end
Then e.g.,
julia> X = randn(10,4);
julia> rand(m(X=X,num_levels=3)) |> pairs
pairs(::NamedTuple) with 4 entries:
:k => 4
:p => [[0.352116, 0.40508, 0.242804], [0.133515, 0.18489, 0.681595], [0.614616, 0.115557, 0.269826…
:β => [0.0344964 0.280074 0.197168; 0.0608256 -0.324698 -0.338867; -1.23773 0.168543 -0.159374; 0.…
:y => [2, 3, 1, 2, 1, 1, 3, 3, 3, 2]
Is this right? If you have three levels, you only need two coefficients per covariant, right?
The coefficient for the base class is always 1
This can be a little tricky...
If you're doing MLE, the "base class" approach is clearly the way to go. The danger outside of that case is in losing symmetry. Like, if you have three classes, and consider the first as the base class but regularize the coefficients for the other two, the estimate will be biased toward the base class.
The way I'm going it here, there are three parameters for each of the features. That's one more than we need, but then we can use softmax
to project onto the 2-simplex.
We could reduce the parameter space if we had a parameterization of the simplex that keeps the symmetry we need. There's probably a nice way to do that, maybe we should check the Stan docs
The danger outside of that case is in losing symmetry. Like, if you have three classes, and consider the first as the base class but regularize the coefficients for the other two, the estimate will be biased toward the base class.
Ooooh, good point.
Yeah let's see what other PPLs do here.
https://mc-stan.org/docs/2_19/stan-users-guide/parameterizing-centered-vectors.html
EDIT: Oh, that's a slightly different issue. In the neighborhood though
Ok, here: https://mc-stan.org/docs/2_24/stan-users-guide/multi-logit-section.html
Hmm, I don't quite see their solution.
They say the model is only identifiable if you have a "suitable prior"... but they don't tell us what that is?
Then they go on to outline the solution with (K-1) coefficients per covariate, but IIUC that's just the same thing you and I were discussing above, which as you point out is not necessarily what we want.
I like the idea of using the full K coefficients per covariate, because then the model is symmetric and we don't bias things in favor of an arbitrarily chosen base class. But I'm still confused on how we implement that...
I think it's just what we have. Gaussian is informative enough to give us identifiability.
I think it's just what we have. Gaussian is informative enough to give us identifiability.
Huh, I didn't know that.
Getting pickier, we could be using log-lengths - maybe interpretable in terms of allometric growth. Also, I think rstanarm
does something to account for different scales. Rather than standardize, you can have different priors across the features. But maybe this fine-tuning is bets left until the bulk of this is in place
Sounds good to me!
The next main goal is to figure out how to make predict
return a Vector{UnivariateFinite}
.
After that I think we're pretty much ready to add the example to the docs.
Example is almost working here: https://github.com/cscherrer/SossMLJ.jl/blob/cs-dev/examples/example-classification.jl
But I'm getting
julia> fit!(mach)
[ Info: Training Machine{SossMLJModel{Model{,…},…}} @957.
┌ Error: Problem fitting the machine Machine{SossMLJModel{Model{,…},…}} @957, possibly because an upstream node in a learning network is providing data of incompatible scitype.
└ @ MLJBase ~/.julia/packages/MLJBase/uYW4k/src/machines.jl:425
[ Info: Running type checks...
[ Info: Type checks okay.
ERROR: NamedTuple names and field types must have matching lengths
Stacktrace:
[1] fit_only!(::Machine{SossMLJModel{Model{NamedTuple{(:X, :num_levels),T} where T<:Tuple,TypeEncoding(begin
k = size(X, 2)
β ~ Normal() |> iid(k, num_levels)
p = smax.(eachrow(X * β))
species ~ For(p) do pj
Categorical(pj; check_args = false)
end
end),TypeEncoding(Main)},var"#7#8",typeof(dynamicHMC),NamedTuple{(:num_levels,),Tuple{Int64}},Symbol}}; rows::Nothing, verbosity::Int64, force::Bool) at /home/chad/.julia/packages/MLJBase/uYW4k/src/machines.jl:436
[2] fit_only! at /home/chad/.julia/packages/MLJBase/uYW4k/src/machines.jl:389 [inlined]
[3] #fit!#87 at /home/chad/.julia/packages/MLJBase/uYW4k/src/machines.jl:481 [inlined]
[4] fit!(::Machine{SossMLJModel{Model{NamedTuple{(:X, :num_levels),T} where T<:Tuple,TypeEncoding(begin
k = size(X, 2)
β ~ Normal() |> iid(k, num_levels)
p = smax.(eachrow(X * β))
species ~ For(p) do pj
Categorical(pj; check_args = false)
end
end),TypeEncoding(Main)},var"#7#8",typeof(dynamicHMC),NamedTuple{(:num_levels,),Tuple{Int64}},Symbol}}) at /home/chad/.julia/packages/MLJBase/uYW4k/src/machines.jl:479
[5] top-level scope at REPL[29]:1
[6] include_string(::Function, ::Module, ::String, ::String) at ./loading.jl:1088
Also I'm not sure about the Vector{UnivariateFinite}
thing. I don't see UnivariateFinite
in Distributions.jl. Where does it live?
It's specific to MLJ.
I believe the distribution itself is defined in MLJBase and/or MLJModelInterface.
But the documentation is in MLJ: https://alan-turing-institute.github.io/MLJ.jl/dev/adding_models_for_general_use/#Prediction-types-for-probabilistic-responses-1
Use the distribution MMI.UnivariateFinite for Probabilistic models predicting a target with Finite scitype (classifiers). In this case the eltype of the training target y will be a CategoricalValue.
For efficiency, one should not construct UnivariateDistribution instances one at a time. Rather, once a probability vector or matrix is known, construct an instance of UnivariateFiniteVector <: AbstractArray{<:UnivariateFinite},1} to return. Both UnivariateFinite and UnivariateFiniteVector objects are constructed using the single UnivariateFinite function.
If we run into trouble, we can probably ping Anthony and Thibaut to help us with the specifics.
I had a silly mistake that's obvious now, just needed to sleep on it :)
julia> using NamedTupleTools
julia> namedtuple(:y)(randn(3))
ERROR: NamedTuple names and field types must have matching lengths
Stacktrace:
[1] NamedTuple{(:y,),T} where T<:Tuple(::Tuple{Float64,Float64,Float64}) at ./boot.jl:547
[2] NamedTuple{(:y,),T} where T<:Tuple(::Array{Float64,1}) at ./namedtuple.jl:104
[3] top-level scope at REPL[33]:1
[4] include_string(::Function, ::Module, ::String, ::String) at ./loading.jl:1088
julia> namedtuple(:y)(tuple(randn(3)))
(y = [0.6148708039999163, 0.7939461461720311, -0.013690630507490925],)
Multi-class classifiers are a pretty popular use case in MLJ, so I think an important next step will be getting an example. I think the Iris example is pretty common.
The Iris dataset is available in RDatasets.jl.
MLJ has an Iris example here:
I wrote a tutorial for Turing here: https://turing.ml/dev/tutorials/8-multinomiallogisticregression/
We'll need to do some preprocessing. In particular, we should transform all of the input feature columns by subtracting the column's mean and dividing by the column's standard deviation.
The tricky part is going to be getting
MLJModelInterface.predict
to return aVector{UnivariateFinite}
.We can leave
predict_joint
andpredict_particles
unmodified.