JuliaML / LossFunctions.jl

Julia package of loss functions for machine learning.
https://juliaml.github.io/LossFunctions.jl/stable
Other
147 stars 33 forks source link

Reorganize codebase #130

Closed juliohm closed 4 years ago

juliohm commented 4 years ago

This PR moves source code around to improve the organization of concepts in the package. This exercise has already helped me understand various points of improvement such as the implementation of meta losses "scaled", "ordinal" and "weightedbinary".

There is a single breaking change here, which is the removal of the OrdinalHingeLoss type alias. I don't think we should export this specific combination. Users can still construct OrdinalMarginLoss{HingeLoss} like with other losses.

juliohm commented 4 years ago

@joshday I know this PR has a lot of changes to review, but I appreciate if you can take a quick look at it before we can proceed with additional improvements. Below I summarize the changes in a high level language so that you don't waste too much time reading the code. I can guarantee that the tests are passing locally as before, and that this is mainly a reorganization of the source code.

Reorganization

Previously the source file supervised.jl was really large. It contained both a set of traits for the abstract types SupervisedLoss, DistanceLoss and MarginLoss, multiple docstrings for various methods of value, value!, deriv, deriv!, deriv2, and deriv2!, and implementations for the various aggregation modes None, Sum, Mean, WeightedSum and WeightedMean.

My changes consisted of splitting this file into various small files of digestible size for the different aggregation modes. I've also eliminated docstrings that were repeated for different methods of the same function. The plan is to move these docstrings to the LearnBase.jl interface package.

Upcoming changes

This present PR is self-contained. After it is merged I have a few more changes planned in LearnBase.jl according to https://github.com/JuliaML/LearnBase.jl/issues/35 before I can keep working in LossFunctions.jl

I've started reorganizing LearnBase.jl to facilitate future contributions. The new codebase will be split into small files for different concepts. In this case, I've already created a file for the costs.jl and migrated all the docstrings from LossFunctions.jl there (local repo). I've also noticed that some traits have never been implemented here, which is a sign that the trait could be removed from the interface temporarily. My next immediate steps after this PR is merged are:

  1. Address https://github.com/JuliaML/LearnBase.jl/issues/35
  2. Refactor LossFunctions.jl again to use the new LearnBase.jl
  3. Add support for CategoricalArrays.jl in LossFunctions.jl
  4. Come back to writing this paper that depends on these PRs :)
johnnychen94 commented 4 years ago

The documentation failure can be fixed by removing docs/Manifest.toml; it locks a different uuid of LossFunctions (ref: https://github.com/JuliaML/LossFunctions.jl/commit/ad125ad0749e1f99e4a82164d0eb2e631e91aa53)

juliohm commented 4 years ago

Thanks, will take a look at it today. Basically I need to update Manifest.toml to look at the parent folder in dev version (] dev ..). I think I had used the UUID from LearnBase.jl in some previous commit, but that is fixed now thanks for JuliaRegistrator that pointed the issue.

juliohm commented 4 years ago

This PR is finally ready after a great amount of refactoring in LearnBase.jl and here. There are a number of code simplifications that I would like to share with you before we proceed and merge.

Improvements

Breaking

Other functions that could perhaps be deprecated are the mutating versions value!, deriv!, and deriv2!. I didn't deprecate them in this PR because that requires more thought. I understand that we can always achieve the same non-allocating result with broadcasting @. buffer = value(l, t, y) for normal arrays, but I am not sure this line would also work with sparse arrays for example? Because I don't have enough experience with sparse arrays in Julia, I decided to not touch the functions and maintain them.

I've also updated the documentation to reflect the latest version of the code. In order to build it successfully, you need to get the master version of LearnBase.jl from inside the docs folder:

] add LearnBase#master
] instantiate
include("make.jl")

Appreciate if you can take a second look at this PR. The plan is to merge it by the end of the weekend. Meanwhile I will be working on adding support for CategoricalArrays.jl, which is the last item necessary for this paper I am working on.