Add ability to query domain of a loss function

ahwillia commented 8 years ago

I think it would be nice to add something along the lines of the following:

l1 = LogitMarginLoss()
l2 = HingeLoss()
l3 = PoissonLoss()
l4 = L2DistLoss()

domain(l1)   # returns -1:2:1
domain(l2)   # or maybe... BinaryVariable (a type we define)

domain(l3)   # all non-negative integers... how to represent?

domain(l4)   # returns Real

"domain" might be the wrong name or at least misleading because the losses are defined over all real numbers... But I want to access the possible values of the data.

Evizero commented 8 years ago

Not sure I completely follow. Am I correct in my assumption that in this example you think of a loss as just a function of the targets (y) disregarding the output (yhat)?

ahwillia commented 8 years ago

Yes - a representation of the possible target values. I agree domain(loss) is a poorly chosen name. Maybe targets(loss) ?

Evizero commented 8 years ago

targetsdomain ?

I do not like targets because we may want to use the function on a high level interface to retrieve the targets of a trained model or something

tbreloff commented 8 years ago

I like the intention, but the name needs bikeshedding. Maybe "space"? Or "output_domain"?

On Saturday, August 13, 2016, Christof Stocker notifications@github.com wrote:

targetsdomain ?

I do not like targets because we may want to use the function on a high level interface to retrieve the targets of a trained model or something

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/JuliaML/Losses.jl/issues/34#issuecomment-239639193, or mute the thread https://github.com/notifications/unsubscribe-auth/AA492kH2VAV_xV5vQhN1gEkeFWqse9q8ks5qfiS3gaJpZM4Jjv3L .

Evizero commented 8 years ago

output_domain would be especially confusing because the other - in this case uninteresting - loss parameter is called output

Evizero commented 8 years ago

Thinking about this a little more. I think this function should return some AbstractVector, that would allow for things like in, or maxmimum etc. So a margin loss would be easy with [-1,1]. for a distance loss I am not so sure. Is there a way to specify a continuous range? I am guessing that -Inf:.00000001:Inf is just asking for trouble

ahwillia commented 8 years ago

What about something like this:

abstract LossScale

immutable BinaryScale{T<:Number} <: LossScale end
BinaryScale() = BinaryScale{Float64}() # default type

collect{T}(::BinaryScale{T}) = [-one(T), one(T)]
eltype{T}(::BinaryScale{T}) = T
isdiscrete(::BinaryScale) = true
nlevels(::BinaryScale) = 2

immutable OrdinalScale{T<:Number} <: LossScale
    r::Range{T}
end
OrdinalScale{T<:Number}(lower::T,upper::T) = lower:upper

collect(s::OrdinalScale) = collect(s.r)
isdiscrete(::OrdinalScale) = true
nlevels(::OrdinalScale) = length(s.r)

immutable RealScale{T<:Number} <: LossScale end
RealScale() = RealScale{Float64}()

collect(::RealScale) = error("infinite list")
isdiscrete(::RealScale) = false
nlevels{T}(::RealScale{T}) = T(Inf)

Then:

scale{T}(::LogitMarginLoss{T}) = BinaryScale{T}()
scale{T}(::PoissonLoss{T}) = OrdinalScale(T(1):T(1):T(Inf)) # ???
scale{T}(::L2DistLoss{T}) = RealScale{T}()

Evizero commented 8 years ago

That seems like a good idea! Would probably also be useful to provide in to test membership.

I don't think scale is an intuitive function name though, but I lack new ideas. maybe domain_of_targets. Since it is not a everyday function a little verbosity wouldn't hurt

ahwillia commented 8 years ago

Thinking about this more... scale is a terrible name as it conflicts with Base.scale. On the other hand I like "scale" as a reference to "measurement scale": https://en.wikipedia.org/wiki/Level_of_measurement

Maybe:

measured_scale(::Loss)
measured_levels(::Loss)
levels(::Loss)

I actually like levels quite a bit. "Domain of targets" would be okay too I guess, but is a bit verbose.

Evizero commented 8 years ago

I could see levels working out. What do you think @tbreloff ?

tbreloff commented 8 years ago

I'm thinking "set" is the general term. And this shouldn't be limited to loss outputs. Can we improve/generalize the "action sets" that we have in Reinforce.jl to be applicable here? cc @spencerlyon2

In essence I'd like a nice way to represent a mathematical set, not a data structure set. Then use that as the output of domain, or whatever we call it.

On Monday, August 15, 2016, Christof Stocker notifications@github.com wrote:

I could see levels working out. What do you think @tbreloff https://github.com/tbreloff ?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/JuliaML/Losses.jl/issues/34#issuecomment-239745379, or mute the thread https://github.com/notifications/unsubscribe-auth/AA492usKjCCSELRdVxKzTXhmvDAxqm0gks5qgBeJgaJpZM4Jjv3L .

tbreloff commented 8 years ago

For completeness (now that I'm at my computer), this is what is currently in Reinforce:

abstract AbstractActionSet

# allow continuous value(s) in a range(s)
immutable ContinuousActionSet{T} <: AbstractActionSet
      amin::T
      amax::T

    function ContinuousActionSet{S<:AbstractVector}(amin::S, amax::S)
        if !(length(amin) == length(amax))
            error("For multi-valued continuous action sets, min and max value must have same length")
        end
        new(amin, amax)
    end

    ContinuousActionSet{S<:Number}(amin::S, amax::S) = new(amin, amax)
end
ContinuousActionSet{T}(amin::T, amax::T) = ContinuousActionSet{T}(amin, amax)

Base.length(aset::ContinuousActionSet) = length(aset.amin)
Base.rand{T<:Number}(aset::ContinuousActionSet{T}) = rand() * (aset.amax - aset.amin) + aset.amin
Base.rand{T<:AbstractVector}(aset::ContinuousActionSet{T}) = rand(length(aset)) .* (aset.amax - aset.amin) + aset.amin

Base.in{T<:Number}(x::Number, aset::ContinuousActionSet{T}) = aset.amin <= x <= aset.amax
Base.in{T<:AbstractVector}(x::AbstractVector, aset::ContinuousActionSet{T}) =
    length(x) == length(aset) && all(aset.amin .<= x .<= aset.amax)

# choose from discrete actions
immutable DiscreteActionSet{T} <: AbstractActionSet
    actions::T
end
Base.rand(aset::DiscreteActionSet) = rand(aset.actions)
Base.in(x, aset::DiscreteActionSet) = x in aset.actions
Base.length(aset::DiscreteActionSet) = length(aset.actions)
Base.getindex(aset::DiscreteActionSet, i::Int) = aset.actions[i]

# several action sets of varying types
immutable MultiActionSet{T<:Tuple} <: AbstractActionSet
    asets::T
end

MultiActionSet(asets::AbstractActionSet...) = MultiActionSet(asets)

Base.rand(::Type{Vector}, aset::MultiActionSet) = [rand(i) for i in aset.asets]
Base.rand(::Type{Tuple}, aset::MultiActionSet) = ntuple(i->rand(aset.asets[i]), length(aset.asets))
Base.rand(aset::MultiActionSet) = rand(Vector, aset)

Base.in(x, aset::MultiActionSet) = all(map(in, x, aset.asets))

The idea is that this could be generalized to be "discrete sets", "section(s) of the reals", etc, and we could have tuples, ntuples, or arrays of these sets depending on heterogeneity of the components.

These could be use as input validators and output domains for any transformations or losses, and adding this info might allow for optimized implementations down the road.

ahwillia commented 8 years ago

I like the sound of TargetSet. Only potential problem I see is that Set is defined in Base starting in v0.5 should a TargetSet and ActionSet be a subtype of this? Is there possibility for confusion or inconsistent semantics?

ahwillia commented 8 years ago

For what its worth this well-known paper refers to all of this as domain, which makes sense from a probabilistic perspective. E.g. for the Poisson loss, you are more-or-less minimizing the negative log-likelihood, usually written: NLL(x; theta). The x is what we've been calling the target and theta is what we've been calling the output. So the "target set" is really the domain of the likelihood function.

Consistency with published literature is a big plus for me. On the other hand TargetSet is more explicit for those without the probability background.

Evizero commented 8 years ago

I like "domain" more than "set", because set is just a too general of a term. Also I am a fan with going with what the literature says as long as it doesn't contradict the literature the package is based on in the first place; which it doesn't seem to.

For being more explicit, and since it won't be a heavily used function, can we go with the longer name targetdomain ? The name even seems general enough that we could reuse it in higher level interfaces like targetdomain(LogisticRegression)

Evizero commented 8 years ago

That said I could very well see the function targetdomain return a ContinuousSet as depicted by Tom

ahwillia commented 8 years ago

+1 for targetdomain which returns some sort of object. I'd like to call it Set but am worried that Base.Set in v0.5 is only meant to hold discrete things? If thats the case we may need a synonym.

tbreloff commented 8 years ago

targetdomain works for me. And I'll work on getting the set objects into LearnBase today.

On Wednesday, August 17, 2016, Alex Williams notifications@github.com wrote:

+1 for targetdomain which returns some sort of object. I'd like to call it Set but am worried that Base.Set in v0.5 is only meant to hold discrete things? If thats the case we may need a synonym.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/JuliaML/Losses.jl/issues/34#issuecomment-240337754, or mute the thread https://github.com/notifications/unsubscribe-auth/AA492nCnyQyMoNj1g5bNWVVWNU5LwWrzks5qgrucgaJpZM4Jjv3L .

mihirparadkar commented 7 years ago

Related to https://github.com/JuliaML/LossFunctions.jl/issues/86 , LowRankModels uses the domain as a parameter for imputation, since the predicted value is different depending on whether the labels are boolean/categorical/ordinal/real. For example, in an ordinary regression problem, the prediction is simply the value that minimizes the L2DistLoss, but in ordinal regression we only want values within the set of ordinal levels. These are implemented in LowRankModels, but this presents a neat opportunity to discuss and perfect the API for these.

joshday commented 7 years ago

Is this something that could be built as a wrapper type like what we have for a scaled loss?

mihirparadkar commented 7 years ago

I'm not sure that the domain is something that needs to be attached to the loss function itself. Here's how I understand the reason for the domain, anyway.

The loss function is needed to minimize the error between some real-valued prediction and a label.

However, the real-valued prediction should theoretically be able to be mapped to any domain, and neither the label or original loss function need be known at predict-time.

I suppose these have nothing to do with LossFunctions per se, but implementing them somewhere (maybe "Domains.jl") would be very useful for both low rank models and JuliaML as a whole.

predict(d::RealDomain{S}, v::T) where S <: AbstractFloat where T <: AbstractFloat = S(v)
predict(d::BoolDomain, v::T) where T <: AbstractFloat  = v > 0
predict(d::OrdinalDomain, v::T) where T <: AbstractFloat = ...
...

Then, convenience methods for common types could be implemented, like

predict(v::T) where T <: AbstractFloat = predict(RealDomain{T}(), v)
predict(Bool, v::T) where T <: AbstractFloat = predict(BoolDomain(), v)
...

mihirparadkar commented 7 years ago

@madeleineudell @c42f

Would anyone else like to weigh in?

madeleineudell commented 7 years ago

@mihirparadkar , I like your proposed API, but I don't think it's sufficient. The problem is that for each loss function, the most canonical way to predict, given a modeled value v, is to minimize the loss over any possible observation fixing the modeled value v. The answer depends not only on the modeled value v, but on the loss function.

There are some simplifications. Eg any difference loss is minimize when the modeled value v matches the value of the observation. But for categorical loss functions or ordinal loss functions, I think different valid loss functions might give different answers, even fixing v. So I don't think we can suppress the loss function in the argument to a predict method.

mihirparadkar commented 7 years ago

@madeleineudell I understand that the definition of predicting given a modeled value v is the minimized loss over every possible observation. In the difference loss and margin loss (and I believe 'ordinalized' margin losses), the minimized loss is relatively straightforward (for real domains with difference loss, it's just the value v, and for boolean domains with margin losses or difference losses, it's sign(v)).

Correct me if I'm wrong, but for multivariate categorical losses, isn't the loss is minimized for the index with the maximal value for both One-vs-All and softmax (the two multivariate categorical losses from LowRankModels)? I'm much less sure about the multivariate ordinal losses (BvS and MNLOrdinal). In the multivariate cases, could we implement a different API that requires the loss function present?

And in the scalar case (Real, Periodic, or Boolean), I am aware of cases where the simple rule I proposed is not the true minimizer over the possible observations, but I can't think of any that aren't pathological in some way. The only cases I can think of are when using a margin loss or 'ordinalized' margin loss to predict over real values, where the suggested predict would produce v while the correct answer is technically Inf or -Inf. I'm ambivalent about giving up the simplicity of the API for cases like these.

I know that in LowRankModels, you throw an error for these kind of prediction/loss mismatches, but I don't think that this sort of mistake-handling needs to be done at the loss function/prediction level. Since we intend models to build upon these, it could be left at the model-level (i.e. the GLRM or linear model) to signal a mismatch between intended prediction domain and loss function.

juliohm commented 4 years ago

Even though we have many good insights in this issue, the discussion and code snippets are outdated, particularly regarding the levels of categorical variables. Nowadays, we have CategoricalArrays.jl as the de-facto standard for categorical variables.

Also, it is not very clear how the domain/space/targetdomain could be useful. In general, we would like to compare the distribution of the losses with the observations as opposed to just the support of the distribution. I am trying to revive the implementations here for Julia v1.x, and plan to add support for a more general interface in a proposal PR.

JuliaML / LossFunctions.jl

Add ability to query domain of a loss function #34