Closed tbreloff closed 8 years ago
Let me start by saying that I have yet to take a close look into OnlineAI and OnlineStats (but I promise I will)
This looks like a good start. Thanks for taking the time and kicking this off
I think we should settle on either transform
or predict
, where I would favour predict. Or is there a good reason to have both (I could be convinced)?
Concerning evaluate
: If I understand this right I think cost
would be a better name. To me loss is different than cost in the sense that cost also includes the regularization term, which is usually what you want
Classification is just a discrete prediction, and I think should have the same verbs as a linear regression
I'm not convinced on that. Let's take (soft-margin) SVMs for example. The thing that really gives the SVM outcome a conceptual interpretation is the utilized loss function. so actually predicting a class is g(predict(x))
where g is the decision function (which is usually sign
for SVMs). For logistic regression it is the same principle regardless of if you are using a sigmoid or affine prediction function. So I think it would make more sense to make predict
really predict the response, and have something like classify
to have an intrinsic decision for convenience.
For me it boils down to this: I don't want the high level interface (i.e. for simply applying ML to some problem) to be the default. I want this framework to be research friendly.
concerning predict
and classify
: it seems like OnlineStats is already doing what I proposed. So I am in favour of sticking to that approach (i.e. predict
gives the raw response and classify
additionally applies the decision function)
Extending the conversation at OnlineStats... There are lots of different things that we're trying to cover with a single verb. learn
, train
, fit
, solve
, update
... they all mean basically the same thing, but certainly there will be some simple models where train
or solve
seems out of place, and some complex models where update
sounds too simplistic. Also I feel like update
is more appropriate for online models, and the others more appropriate for batch models.
What if we don't have a "batch" verb? At the core, there is only the constructor. The fit happens as part of model creation. Parameters, etc are passed into a constructor through some sort of parameter abstraction, and data is passed in through another abstraction, and the model is fit
during creation.
Another idea... what if the models are implicitly callable (by implementing Base.call
), and we could do things like:
o = PCA(data)
o(x) # calls transform(o, x)
Then we never use a verb. :+1:
I considered that approach before. I think it has some ugly problems.
First of all I think scaling out using just language constructs should be simple. For that it would make sense if the model itself is more of a parameter container (this includes for example the loss functor). This small object would be easy to move around processes to train somewhere else. Bottom line it should be easy to decide when and where an object is trained vs when it's parameters are decided.
Also I think we would throw away the beauty of Julia's multiple dispatch where a function can belong to different things equally. This way the training function is conceptionally no different than an OO method that is part of the model itself. How would a user be able to prototype a new kind of solver for example?
While I think this works well for things like GLMs, I don't think it would be flexible enough for what I want KSVM.jl to be.
Are you really that opposed to using fit
? I get that it doesn't always roll of the tongue, but at least it would be consistent among Julia libraries and easy to remember
Concerning the Base.call
overloading. I don't know. I actually didn't know that is possible. It doesn't seem intuitive though. It seems like bad practice somehow
Also we should consider whether online updates should adopt the push/append idiom from base.
On Oct 12, 2015, at 4:15 PM, Christof Stocker notifications@github.com wrote:
Concerning the Base.call overloading. I don't know. I actually didn't know that is possible. It doesn't seem intuitive though. It seems like bad practice somehow
— Reply to this email directly or view it on GitHub.
I don't think the descriptions for push/append quite match online updates:
help?> push!
search: push! pushdisplay
.. push!(collection, items...) -> collection
Insert one or more ``items`` at the end of ``collection``.
I thought the idea was to use fit!
for online updates?
I certainly haven't settled on the "right" verbs yet. I'm leaning towards the method of aliases though. (const train = fit), etc. It might make it easier to link disparate packages.
On Oct 12, 2015, at 8:19 PM, Josh Day notifications@github.com wrote:
I don't think the descriptions for push/append quite match online updates:
help?> push! search: push! pushdisplay
.. push!(collection, items...) -> collection
Insert one or more
items
at the end ofcollection
. I thought the idea was to use fit! for online updates?— Reply to this email directly or view it on GitHub.
I'm leaning towards the method of aliases though. (const train = fit), etc.
I think this is frowned upon if I remember the Julia style guide correctly
In the end I don't think there is a clear separation of online vs batch anyhow. I mean theoretically I could fit a logistic regression using BFGS with maxiter = 10
, and then decide to let it train some more using the same deterministic algorithm (provided the implementation allows for a hotstart).
myfit = fit(MyModel(), solver=BFGS(), maxiter=10)
#....
fit!(myfit, solver=BFGS(), maxiter=100)
The most readable thing to me would be to have one verb (plus the ! version of it) accross the bench that denotes "something is learning from data".
The cool thing about Julia (and one of it's convincing aspects to me) is that if the code has a nice licence and is carefully crafted, it can also serve as reference implementation for the pseudocode of papers. So to me it would also be important that the package itself follows the same style that it's interface does (so I think it would be confusion if the user is used to fit
but internally the package uses update
).
If you are really that much against fit
then we should think about settling on a different verb. But it would be nice to have some consistency even if the verb itself isn't always "perfect".
If you don't like that either maybe we can come up with a solution that uses two verbs where we can clearly define which verb is appropriate for which situation. I can see the case for update!
but then again what is the meaning of fit!
with a "!" then. After all, if we were using fit we would already have two verbs fit
and fit!
that clearly mean different things (and their meaning would be intuitive to a normal Julia user)
On an other tangent: I have been thinking about your proposed "using the constructor to fit something" approach a bit. Although I still don't think it would be a good universal solution for all kinds of models, I have to admit that the approach is growing on me for things like PCA or CenterScale. I think you are on to something there
The following seems like nice code to me (and having transform
is growing on me too for unsupervised things):
cs = CenterScale(Xtrain)
transform!(cs, Xtrain)
pca = PCA(Xtrain)
Xtrain2 = transform(pca, Xtrain)
myfit = fit(MyModel(), Xtrain2)
yhat = predict([cs, pca, myfit], Xtest)
However, if we keep consistent with using fit
then things like PCA would not need special treatment
So this code:
cs = fit(CenterScale(), Xtrain)
Xtrain2 = predict(cs, Xtrain)
pca = fit(PCA(), Xtrain2)
Xtrain3 = predict(pca, Xtrain2)
myfit = fit(MyModel(), Xtrain3)
yhat = predict(myfit, Xtrain3)
could be simplified to
mypipe = fit([CenterScale(), PCA(), MyModel()], Xtrain)
# mypipe == [cs, pca, myfit]
yhat = predict(mypipe, Xtest)
EDIT: So the objects like PCA
would simply serve as container for hyper parameter and their constructor would be used to specify said model specific parameters
EDIT: I am not sure how it would work if we mix supervised and unsupervised models though
Concerning verbs, we should keep an eye on PR87 of Optim.jl and use the same verbs where they make sense.
Ultimately I'd like to move from Regression.jl to Optim.jl for deterministic optimization (because it is more general and has a more active community), but atm Regression.jl is faster (based on just two informal comparisons I did a month back. So this is not a very sound statement).
Until there is something like OptimBase.jl I will introduce a dependency to Optim for the API verbs I mentioned. Not ideal, but Optim doesn't seem so heavyweight. thoughts?
outsourced
What are they, and what functionality and traits do they imply?
Here's a list to start discussion:
fit
:fit!
:transform
/predict
:evaluate
Others?
Notes:
const predict = transform
as a convenience is ok in my eyes