Closed ahwillia closed 4 years ago
Not sure I completely follow. Am I correct in my assumption that in this example you think of a loss as just a function of the targets (y) disregarding the output (yhat)?
Yes - a representation of the possible target values. I agree domain(loss)
is a poorly chosen name. Maybe targets(loss)
?
targetsdomain
?
I do not like targets
because we may want to use the function on a high level interface to retrieve the targets of a trained model or something
I like the intention, but the name needs bikeshedding. Maybe "space"? Or "output_domain"?
On Saturday, August 13, 2016, Christof Stocker notifications@github.com wrote:
targetsdomain ?
I do not like targets because we may want to use the function on a high level interface to retrieve the targets of a trained model or something
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/JuliaML/Losses.jl/issues/34#issuecomment-239639193, or mute the thread https://github.com/notifications/unsubscribe-auth/AA492kH2VAV_xV5vQhN1gEkeFWqse9q8ks5qfiS3gaJpZM4Jjv3L .
output_domain
would be especially confusing because the other - in this case uninteresting - loss parameter is called output
Thinking about this a little more. I think this function should return some AbstractVector
, that would allow for things like in
, or maxmimum
etc. So a margin loss would be easy with [-1,1]
. for a distance loss I am not so sure. Is there a way to specify a continuous range? I am guessing that -Inf:.00000001:Inf
is just asking for trouble
What about something like this:
abstract LossScale
immutable BinaryScale{T<:Number} <: LossScale end
BinaryScale() = BinaryScale{Float64}() # default type
collect{T}(::BinaryScale{T}) = [-one(T), one(T)]
eltype{T}(::BinaryScale{T}) = T
isdiscrete(::BinaryScale) = true
nlevels(::BinaryScale) = 2
immutable OrdinalScale{T<:Number} <: LossScale
r::Range{T}
end
OrdinalScale{T<:Number}(lower::T,upper::T) = lower:upper
collect(s::OrdinalScale) = collect(s.r)
isdiscrete(::OrdinalScale) = true
nlevels(::OrdinalScale) = length(s.r)
immutable RealScale{T<:Number} <: LossScale end
RealScale() = RealScale{Float64}()
collect(::RealScale) = error("infinite list")
isdiscrete(::RealScale) = false
nlevels{T}(::RealScale{T}) = T(Inf)
Then:
scale{T}(::LogitMarginLoss{T}) = BinaryScale{T}()
scale{T}(::PoissonLoss{T}) = OrdinalScale(T(1):T(1):T(Inf)) # ???
scale{T}(::L2DistLoss{T}) = RealScale{T}()
That seems like a good idea! Would probably also be useful to provide in
to test membership.
I don't think scale
is an intuitive function name though, but I lack new ideas. maybe domain_of_targets
. Since it is not a everyday function a little verbosity wouldn't hurt
Thinking about this more... scale
is a terrible name as it conflicts with Base.scale
. On the other hand I like "scale" as a reference to "measurement scale": https://en.wikipedia.org/wiki/Level_of_measurement
Maybe:
measured_scale(::Loss)
measured_levels(::Loss)
levels(::Loss)
I actually like levels
quite a bit. "Domain of targets" would be okay too I guess, but is a bit verbose.
I could see levels
working out. What do you think @tbreloff ?
I'm thinking "set" is the general term. And this shouldn't be limited to loss outputs. Can we improve/generalize the "action sets" that we have in Reinforce.jl to be applicable here? cc @spencerlyon2
In essence I'd like a nice way to represent a mathematical set, not a data structure set. Then use that as the output of domain, or whatever we call it.
On Monday, August 15, 2016, Christof Stocker notifications@github.com wrote:
I could see levels working out. What do you think @tbreloff https://github.com/tbreloff ?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/JuliaML/Losses.jl/issues/34#issuecomment-239745379, or mute the thread https://github.com/notifications/unsubscribe-auth/AA492usKjCCSELRdVxKzTXhmvDAxqm0gks5qgBeJgaJpZM4Jjv3L .
For completeness (now that I'm at my computer), this is what is currently in Reinforce:
abstract AbstractActionSet
# allow continuous value(s) in a range(s)
immutable ContinuousActionSet{T} <: AbstractActionSet
amin::T
amax::T
function ContinuousActionSet{S<:AbstractVector}(amin::S, amax::S)
if !(length(amin) == length(amax))
error("For multi-valued continuous action sets, min and max value must have same length")
end
new(amin, amax)
end
ContinuousActionSet{S<:Number}(amin::S, amax::S) = new(amin, amax)
end
ContinuousActionSet{T}(amin::T, amax::T) = ContinuousActionSet{T}(amin, amax)
Base.length(aset::ContinuousActionSet) = length(aset.amin)
Base.rand{T<:Number}(aset::ContinuousActionSet{T}) = rand() * (aset.amax - aset.amin) + aset.amin
Base.rand{T<:AbstractVector}(aset::ContinuousActionSet{T}) = rand(length(aset)) .* (aset.amax - aset.amin) + aset.amin
Base.in{T<:Number}(x::Number, aset::ContinuousActionSet{T}) = aset.amin <= x <= aset.amax
Base.in{T<:AbstractVector}(x::AbstractVector, aset::ContinuousActionSet{T}) =
length(x) == length(aset) && all(aset.amin .<= x .<= aset.amax)
# choose from discrete actions
immutable DiscreteActionSet{T} <: AbstractActionSet
actions::T
end
Base.rand(aset::DiscreteActionSet) = rand(aset.actions)
Base.in(x, aset::DiscreteActionSet) = x in aset.actions
Base.length(aset::DiscreteActionSet) = length(aset.actions)
Base.getindex(aset::DiscreteActionSet, i::Int) = aset.actions[i]
# several action sets of varying types
immutable MultiActionSet{T<:Tuple} <: AbstractActionSet
asets::T
end
MultiActionSet(asets::AbstractActionSet...) = MultiActionSet(asets)
Base.rand(::Type{Vector}, aset::MultiActionSet) = [rand(i) for i in aset.asets]
Base.rand(::Type{Tuple}, aset::MultiActionSet) = ntuple(i->rand(aset.asets[i]), length(aset.asets))
Base.rand(aset::MultiActionSet) = rand(Vector, aset)
Base.in(x, aset::MultiActionSet) = all(map(in, x, aset.asets))
The idea is that this could be generalized to be "discrete sets", "section(s) of the reals", etc, and we could have tuples, ntuples, or arrays of these sets depending on heterogeneity of the components.
These could be use as input validators and output domains for any transformations or losses, and adding this info might allow for optimized implementations down the road.
I like the sound of TargetSet
. Only potential problem I see is that Set
is defined in Base starting in v0.5 should a TargetSet
and ActionSet
be a subtype of this? Is there possibility for confusion or inconsistent semantics?
For what its worth this well-known paper refers to all of this as domain
, which makes sense from a probabilistic perspective. E.g. for the Poisson loss, you are more-or-less minimizing the negative log-likelihood, usually written: NLL(x; theta)
. The x
is what we've been calling the target
and theta
is what we've been calling the output
. So the "target set" is really the domain of the likelihood function.
Consistency with published literature is a big plus for me. On the other hand TargetSet
is more explicit for those without the probability background.
I like "domain" more than "set", because set is just a too general of a term. Also I am a fan with going with what the literature says as long as it doesn't contradict the literature the package is based on in the first place; which it doesn't seem to.
For being more explicit, and since it won't be a heavily used function, can we go with the longer name targetdomain
? The name even seems general enough that we could reuse it in higher level interfaces like targetdomain(LogisticRegression)
That said I could very well see the function targetdomain
return a ContinuousSet
as depicted by Tom
+1 for targetdomain
which returns some sort of object. I'd like to call it Set
but am worried that Base.Set
in v0.5 is only meant to hold discrete things? If thats the case we may need a synonym.
targetdomain works for me. And I'll work on getting the set objects into LearnBase today.
On Wednesday, August 17, 2016, Alex Williams notifications@github.com wrote:
+1 for targetdomain which returns some sort of object. I'd like to call it Set but am worried that Base.Set in v0.5 is only meant to hold discrete things? If thats the case we may need a synonym.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/JuliaML/Losses.jl/issues/34#issuecomment-240337754, or mute the thread https://github.com/notifications/unsubscribe-auth/AA492nCnyQyMoNj1g5bNWVVWNU5LwWrzks5qgrucgaJpZM4Jjv3L .
Related to https://github.com/JuliaML/LossFunctions.jl/issues/86 , LowRankModels uses the domain as a parameter for imputation, since the predicted value is different depending on whether the labels are boolean/categorical/ordinal/real. For example, in an ordinary regression problem, the prediction is simply the value that minimizes the L2DistLoss, but in ordinal regression we only want values within the set of ordinal levels. These are implemented in LowRankModels, but this presents a neat opportunity to discuss and perfect the API for these.
Is this something that could be built as a wrapper type like what we have for a scaled loss?
I'm not sure that the domain is something that needs to be attached to the loss function itself. Here's how I understand the reason for the domain, anyway.
The loss function is needed to minimize the error between some real-valued prediction and a label.
However, the real-valued prediction should theoretically be able to be mapped to any domain, and neither the label or original loss function need be known at predict-time.
I suppose these have nothing to do with LossFunctions per se, but implementing them somewhere (maybe "Domains.jl") would be very useful for both low rank models and JuliaML as a whole.
predict(d::RealDomain{S}, v::T) where S <: AbstractFloat where T <: AbstractFloat = S(v)
predict(d::BoolDomain, v::T) where T <: AbstractFloat = v > 0
predict(d::OrdinalDomain, v::T) where T <: AbstractFloat = ...
...
Then, convenience methods for common types could be implemented, like
predict(v::T) where T <: AbstractFloat = predict(RealDomain{T}(), v)
predict(Bool, v::T) where T <: AbstractFloat = predict(BoolDomain(), v)
...
@madeleineudell @c42f
Would anyone else like to weigh in?
@mihirparadkar , I like your proposed API, but I don't think it's sufficient. The problem is that for each loss function, the most canonical way to predict, given a modeled value v
, is to minimize the loss over any possible observation fixing the modeled value v
. The answer depends not only on the modeled value v
, but on the loss function.
There are some simplifications. Eg any difference loss is minimize when the modeled value v
matches the value of the observation. But for categorical loss functions or ordinal loss functions, I think different valid loss functions might give different answers, even fixing v
. So I don't think we can suppress the loss function in the argument to a predict
method.
@madeleineudell
I understand that the definition of predicting given a modeled value v
is the minimized loss over every possible observation. In the difference loss and margin loss (and I believe 'ordinalized' margin losses), the minimized loss is relatively straightforward (for real domains with difference loss, it's just the value v
, and for boolean domains with margin losses or difference losses, it's sign(v)
).
Correct me if I'm wrong, but for multivariate categorical losses, isn't the loss is minimized for the index with the maximal value for both One-vs-All and softmax (the two multivariate categorical losses from LowRankModels)? I'm much less sure about the multivariate ordinal losses (BvS and MNLOrdinal). In the multivariate cases, could we implement a different API that requires the loss function present?
And in the scalar case (Real, Periodic, or Boolean), I am aware of cases where the simple rule I proposed is not the true minimizer over the possible observations, but I can't think of any that aren't pathological in some way. The only cases I can think of are when using a margin loss or 'ordinalized' margin loss to predict over real values, where the suggested predict would produce v
while the correct answer is technically Inf
or -Inf
. I'm ambivalent about giving up the simplicity of the API for cases like these.
I know that in LowRankModels, you throw an error for these kind of prediction/loss mismatches, but I don't think that this sort of mistake-handling needs to be done at the loss function/prediction level. Since we intend models to build upon these, it could be left at the model-level (i.e. the GLRM or linear model) to signal a mismatch between intended prediction domain and loss function.
Even though we have many good insights in this issue, the discussion and code snippets are outdated, particularly regarding the levels of categorical variables. Nowadays, we have CategoricalArrays.jl as the de-facto standard for categorical variables.
Also, it is not very clear how the domain/space/targetdomain could be useful. In general, we would like to compare the distribution of the losses with the observations as opposed to just the support of the distribution. I am trying to revive the implementations here for Julia v1.x, and plan to add support for a more general interface in a proposal PR.
I think it would be nice to add something along the lines of the following:
"domain" might be the wrong name or at least misleading because the losses are defined over all real numbers... But I want to access the possible values of the data.