Core verbs - Githubissues

tbreloff commented 8 years ago

What are they, and what functionality and traits do they imply?

Here's a list to start discussion:

fit:
- Take in model, x data, y data (for supervised), fitting params
- Set model parameters appropriate for the input data
fit!:
- Take in model, x data, y data (for supervised), fitting params
- Update model parameters appropriate for the input data
transform/predict:
- Take in model, x data
- Return model predictions or transformations (for a pca model, return the reduced dimensions. for a regression, the predictions)
evaluate
- Take in model, x data, y data
- transform x and apply loss function vs y
- return loss

Others?

Notes:

If we can keep to only fit/transform, then pipelining will be more straightforward.
We can always map internally, so providing const predict = transform as a convenience is ok in my eyes
Classification is just a discrete prediction, and I think should have the same verbs as a linear regression

Evizero commented 8 years ago

Let me start by saying that I have yet to take a close look into OnlineAI and OnlineStats (but I promise I will)

This looks like a good start. Thanks for taking the time and kicking this off

I think we should settle on either transform or predict, where I would favour predict. Or is there a good reason to have both (I could be convinced)?

Concerning evaluate: If I understand this right I think cost would be a better name. To me loss is different than cost in the sense that cost also includes the regularization term, which is usually what you want

Classification is just a discrete prediction, and I think should have the same verbs as a linear regression

I'm not convinced on that. Let's take (soft-margin) SVMs for example. The thing that really gives the SVM outcome a conceptual interpretation is the utilized loss function. so actually predicting a class is g(predict(x)) where g is the decision function (which is usually sign for SVMs). For logistic regression it is the same principle regardless of if you are using a sigmoid or affine prediction function. So I think it would make more sense to make predict really predict the response, and have something like classify to have an intrinsic decision for convenience.

For me it boils down to this: I don't want the high level interface (i.e. for simply applying ML to some problem) to be the default. I want this framework to be research friendly.

Evizero commented 8 years ago

concerning predict and classify: it seems like OnlineStats is already doing what I proposed. So I am in favour of sticking to that approach (i.e. predict gives the raw response and classify additionally applies the decision function)

tbreloff commented 8 years ago

Extending the conversation at OnlineStats... There are lots of different things that we're trying to cover with a single verb. learn, train, fit, solve, update... they all mean basically the same thing, but certainly there will be some simple models where train or solve seems out of place, and some complex models where update sounds too simplistic. Also I feel like update is more appropriate for online models, and the others more appropriate for batch models.

What if we don't have a "batch" verb? At the core, there is only the constructor. The fit happens as part of model creation. Parameters, etc are passed into a constructor through some sort of parameter abstraction, and data is passed in through another abstraction, and the model is fit during creation.

Another idea... what if the models are implicitly callable (by implementing Base.call), and we could do things like:

o = PCA(data)
o(x)  # calls transform(o, x)

Then we never use a verb. :+1:

Evizero commented 8 years ago

I considered that approach before. I think it has some ugly problems.

First of all I think scaling out using just language constructs should be simple. For that it would make sense if the model itself is more of a parameter container (this includes for example the loss functor). This small object would be easy to move around processes to train somewhere else. Bottom line it should be easy to decide when and where an object is trained vs when it's parameters are decided.

Also I think we would throw away the beauty of Julia's multiple dispatch where a function can belong to different things equally. This way the training function is conceptionally no different than an OO method that is part of the model itself. How would a user be able to prototype a new kind of solver for example?

While I think this works well for things like GLMs, I don't think it would be flexible enough for what I want KSVM.jl to be.

Are you really that opposed to using fit? I get that it doesn't always roll of the tongue, but at least it would be consistent among Julia libraries and easy to remember

Evizero commented 8 years ago

Concerning the Base.call overloading. I don't know. I actually didn't know that is possible. It doesn't seem intuitive though. It seems like bad practice somehow

tbreloff commented 8 years ago

Also we should consider whether online updates should adopt the push/append idiom from base.

On Oct 12, 2015, at 4:15 PM, Christof Stocker notifications@github.com wrote:

Concerning the Base.call overloading. I don't know. I actually didn't know that is possible. It doesn't seem intuitive though. It seems like bad practice somehow

— Reply to this email directly or view it on GitHub.

joshday commented 8 years ago

I don't think the descriptions for push/append quite match online updates:

help?> push!
search: push! pushdisplay

  ..  push!(collection, items...) -> collection

  Insert one or more ``items`` at the end of ``collection``.

I thought the idea was to use fit! for online updates?

tbreloff commented 8 years ago

I certainly haven't settled on the "right" verbs yet. I'm leaning towards the method of aliases though. (const train = fit), etc. It might make it easier to link disparate packages.

On Oct 12, 2015, at 8:19 PM, Josh Day notifications@github.com wrote:

I don't think the descriptions for push/append quite match online updates:

help?> push! search: push! pushdisplay

.. push!(collection, items...) -> collection

Insert one or more items at the end of collection. I thought the idea was to use fit! for online updates?

— Reply to this email directly or view it on GitHub.

Evizero commented 8 years ago

I'm leaning towards the method of aliases though. (const train = fit), etc.

I think this is frowned upon if I remember the Julia style guide correctly

Evizero commented 8 years ago

In the end I don't think there is a clear separation of online vs batch anyhow. I mean theoretically I could fit a logistic regression using BFGS with maxiter = 10, and then decide to let it train some more using the same deterministic algorithm (provided the implementation allows for a hotstart).

myfit = fit(MyModel(), solver=BFGS(), maxiter=10)
#....
fit!(myfit, solver=BFGS(), maxiter=100)

The most readable thing to me would be to have one verb (plus the ! version of it) accross the bench that denotes "something is learning from data".

The cool thing about Julia (and one of it's convincing aspects to me) is that if the code has a nice licence and is carefully crafted, it can also serve as reference implementation for the pseudocode of papers. So to me it would also be important that the package itself follows the same style that it's interface does (so I think it would be confusion if the user is used to fit but internally the package uses update).

If you are really that much against fit then we should think about settling on a different verb. But it would be nice to have some consistency even if the verb itself isn't always "perfect".

If you don't like that either maybe we can come up with a solution that uses two verbs where we can clearly define which verb is appropriate for which situation. I can see the case for update! but then again what is the meaning of fit! with a "!" then. After all, if we were using fit we would already have two verbs fit and fit! that clearly mean different things (and their meaning would be intuitive to a normal Julia user)

Evizero commented 8 years ago

On an other tangent: I have been thinking about your proposed "using the constructor to fit something" approach a bit. Although I still don't think it would be a good universal solution for all kinds of models, I have to admit that the approach is growing on me for things like PCA or CenterScale. I think you are on to something there

Evizero commented 8 years ago

The following seems like nice code to me (and having transform is growing on me too for unsupervised things):

cs = CenterScale(Xtrain)
transform!(cs, Xtrain)

pca = PCA(Xtrain)
Xtrain2 = transform(pca, Xtrain)

myfit = fit(MyModel(), Xtrain2)

yhat = predict([cs, pca, myfit], Xtest)

However, if we keep consistent with using fit then things like PCA would not need special treatment

So this code:

cs = fit(CenterScale(), Xtrain)
Xtrain2 = predict(cs, Xtrain)

pca = fit(PCA(), Xtrain2)
Xtrain3 = predict(pca, Xtrain2)

myfit = fit(MyModel(), Xtrain3)
yhat = predict(myfit, Xtrain3)

could be simplified to

mypipe = fit([CenterScale(), PCA(), MyModel()], Xtrain)

# mypipe == [cs, pca, myfit]
yhat = predict(mypipe, Xtest)

EDIT: So the objects like PCA would simply serve as container for hyper parameter and their constructor would be used to specify said model specific parameters

EDIT: I am not sure how it would work if we mix supervised and unsupervised models though

Evizero commented 8 years ago

Concerning verbs, we should keep an eye on PR87 of Optim.jl and use the same verbs where they make sense.

Ultimately I'd like to move from Regression.jl to Optim.jl for deterministic optimization (because it is more general and has a more active community), but atm Regression.jl is faster (based on just two informal comparisons I did a month back. So this is not a very sound statement).

Evizero commented 8 years ago

Until there is something like OptimBase.jl I will introduce a dependency to Optim for the API verbs I mentioned. Not ideal, but Optim doesn't seem so heavyweight. thoughts?

Evizero commented 8 years ago

outsourced

JuliaML / LossFunctions.jl

Core verbs #3