JuliaML / LossFunctions.jl

Julia package of loss functions for machine learning.
https://juliaml.github.io/LossFunctions.jl/stable
Other
147 stars 33 forks source link

WIP: Outline new average mode #71

Closed Evizero closed 7 years ago

Evizero commented 7 years ago

I am combining the initial broadcast @ahwillia implemented with the concept of an AverageMode, which I originally planned for MLMetrics. Introducing it here instead will allow for some synergy and a more consistent interface, as MLMetrics will depends on LossFunctions. Basically I am planning on deprecating meanvalue and sumvalue in favour of specifying an additional parameter.

julia> value(L2DistLoss(), [4,6,9], [1,2,3]) # this just calls "value.(...)"
3-element Array{Int64,1}:
  9
 16
 36

julia> value(L2DistLoss(), [4,6,9], [1,2,3], AvgMode.Sum())
61

julia> value(L2DistLoss(), [4,6,9], [1,2,3], AvgMode.Micro())
20.333333333333332

For these losses AvgMode.Micro and AvgMode.Macro yield the same result. The distinction between a micro- and a macro average will be more interesting in MLMetrics.

The real power comes with matrices or higher dimensional arrays, because this change will allow for multivariate regression problems. Basically I allow the user to choose if he/she would like to sum/average over all observations or just over individual observations. The concept of ObsDim allows choosing which dimension of the array denotes the observations and is also used in MLDataUtils and MLLabelUtils.

julia> value(L2DistLoss(), rand(2,5), rand(2,5))
2×5 Array{Float64,2}:
 0.578745     0.144435    0.0634566    0.0871816  0.293819
 0.000124382  1.05113e-5  0.000301812  0.0732131  0.052562

julia> value(L2DistLoss(), rand(2,5), rand(2,5), AvgMode.Sum())
1.8460235498014388

julia> value(L2DistLoss(), rand(2,5), rand(2,5), AvgMode.Sum(), ObsDim.First()) # each row is an obs
2-element Array{Float64,1}:
 0.608044
 1.09483 

julia> value(L2DistLoss(), rand(2,5), rand(2,5), AvgMode.Sum(), ObsDim.Last()) # each col is an obs
5-element Array{Float64,1}:
 0.41182 
 0.669381
 0.123093
 0.016917
 0.10979 

julia> value(L2DistLoss(), rand(2,5), rand(2,5), AvgMode.Micro()) # total unweighted average
0.2747918979060879

julia> value(L2DistLoss(), rand(2,5), rand(2,5), AvgMode.Micro(), ObsDim.First()) # average per row
2-element Array{Float64,1}:
 0.148631
 0.146248

julia> value(L2DistLoss(), rand(2,5), rand(2,5), AvgMode.Micro(), ObsDim.Last()) # average per column
5-element Array{Float64,1}:
 0.112903  
 0.0305063 
 0.00763535
 0.0959092 
 0.276229  

julia> value(L2DistLoss(), rand(2,5), rand(2,5), AvgMode.Weighted([1,2]), ObsDim.First()) # weighted average observations
5.006367100102387e12 # this value is fishy, may have a bug here 

I have not implemented any tests yet, but I thought this would be a good time to allow for opinions and criticism

joshday commented 7 years ago

This is some really interesting stuff! I have a couple questions.

Evizero commented 7 years ago

Do we need new types for ObsDim as opposed to using an Int?

Mhm, here in losses I don't really make use of it for dispatch (I do however in MLDataUtils and MLLabelUtils). Well, for one one would have to specify explicitly the number of dimensions if it is the last dimension that specifies the obs, but I guess that would be manageable. Though I get the appeal of using numbers, I do like the consistency of using it throughout JuliaML.

The main thing I would have to think about how to do it nicely without the type would be O

k = prod(@ntuple($N, n -> n == $O ? 1 : size(output,n)))

but I guess I could. In the other packages I allow the obsdim to be specified using a kw argument, where one can specify it in a number of ways, such as obsdim = 1, obsdim =:last, etc, but since loss functions are an inner loop kinda deal I focus much more on performance and would prefer compile time decisions

What is Micro and Macro?

Not really relevant for losses. I mainly need it for MLMetrics where there is a decision to be made for computing the micro or macro averages (example: http://www.cnts.ua.ac.be/~vincent/pdf/microaverage.pdf).

I am unsure how to handle this. I think it would be nice to use the same kind of specification in both packages, but it also is kinda weird to specify "Micro" or "Macro" when "Mean" would be more descriptive. I am open to ideas

How about having both WeightedSum and WeightedAverage AvgModes?

That I can answer with a clear: Yes we can. I'll add one

joshday commented 7 years ago

here in losses I don't really make use of it for dispatch (I do however in MLDataUtils and MLLabelUtils)

That's enough of a reason for me. I'd rather have the consistency too. Also, thanks for the micro/macro link.

Evizero commented 7 years ago

Concerning weighted sum, what behaviour would you expect from using a weight vector [1,2,3] for three obs. Would you expect their values be multiplied by [1,2,3] or by a normed version [1.66,3.33,.5] ?

Edit: I'll make a parameter to WeightedMean and WeightedSum called normalize so a user can choose

Evizero commented 7 years ago

Arg, damit. I need to reduce the tests. They now take so long that travis thinks it crashed

coveralls commented 7 years ago

Coverage Status

Coverage increased (+33.4%) to 94.64% when pulling f4444b065cf9a23c4ab84d7c1f070aba41a16cac on avgmode into 5182cca34a74cfe237f6e5ad79b6773db6c7e3a0 on master.