EdwardRaff / JSAT

Java Statistical Analysis Tool, a Java library for Machine Learning
GNU General Public License v3.0
788 stars 206 forks source link

No-Arg Constructors? #13

Open salamanders opened 9 years ago

salamanders commented 9 years ago

I did a bad thing :) and used reflection to try to instantiate, train, and test every classifier you got, using brain-dead no-arg or simple arg constructors. With autoAddParameters, cause why not.

https://gist.github.com/salamanders/8e7054f62b53eb772895

It exploded all over the place, of course. Which is why I'm so interested in as many classifiers as possible having a best-practice default.

On the plus side: when it works, it creates some really fun results that I don't think are nearly as easy to produce with competing libraries!

screen shot 2015-07-16 at 5 09 00 pm
EdwardRaff commented 9 years ago

Hah, not bad at all! Thats some nifty code actually!

Bagging - is there a weak classifier that in general can be assumed to be an "ok" starting point?

A Decision Tree is the quintessential "weak" classifier. I would say a tree with a max depth of 6 is a good general purpose weak learner.

Caching & Parallel training - is there an interface possible for caching-enabled trainers?

Only classes extending the SupportVectorLearner class use a cache.

Incompatible data - is there a way to upgrade a one-to-many if the classifier is expecting binary but the data assumes multiple?

I'm not sure I understand what you are asking here. Mind rephrasing?

EDIT: If you have the time, it would be cool to see a plot on the difference in error rate between default parameters and the results of autoAddParamters w/ RandomSearch on a couple datasets.

salamanders commented 9 years ago

"I'm not sure I understand what you are asking here. Mind rephrasing?"

Errr... I'm not entirely sure. I got some that error with "Network needs categorical attributes to work" or "At least 2 categorical variables are needed for ODE" and was wondering if there was a generic way to wrap data or wrap the classifier to make these run. Like using one-vs-rest to turn a binary classifier into a multiple class classifier.

salamanders commented 9 years ago

re: autoAddParamters: I'm not getting the results I think I should be getting. Why would some be negative?

diabetes.svm:

Classifier Time Original Improvment
jsat.classifiers.linear.AROW 444 0.310 -0.008
jsat.classifiers.svm.extended.OnlineAMM 573 0.257 0.001
jsat.classifiers.svm.Pegasos 452 0.327 0.066
jsat.classifiers.neuralnetwork.RBFNet 712 0.299 0.000
jsat.classifiers.trees.ExtraTree 279 0.289 0.000
jsat.classifiers.PriorClassifier 35 0.349 0.000
jsat.classifiers.linear.LogisticRegressionDCD 736 0.229 0.010
jsat.classifiers.linear.LinearBatch 2019 0.224 0.026
jsat.classifiers.MultinomialLogisticRegression 589 0.228 0.000
jsat.classifiers.boosting.LogitBoost 2960 0.308 0.000
jsat.classifiers.knn.NearestNeighbour 2203 0.259 -0.001
jsat.classifiers.linear.SCW 89 0.349 0.109
jsat.classifiers.svm.PlatSMO 3852 0.229 -0.026
jsat.classifiers.Rocchio 18 0.267 0.000
jsat.classifiers.svm.extended.AMM 715 0.241 0.017
jsat.classifiers.linear.NewGLMNET 591 0.227 0.008
jsat.classifiers.linear.kernelized.KernelSGD 7016 0.225 0.033
jsat.classifiers.linear.StochasticMultinomialLogisticRegression 333 0.258 0.000
jsat.classifiers.svm.DCD 6585 0.225 0.000
jsat.classifiers.svm.DCDs 7584 0.225 -0.051
salamanders commented 9 years ago

But overall - my original request for "more classifiers with a sane no-args constructor please!" like including your advice for "a DecisionTree with depth 6 isn't by any means optimal - but it isn't crazy" in all the Boosting constructors.

EdwardRaff commented 9 years ago

The diabetes dataset is pretty hard and not a lot of data, your probably see bigger differences on larger datasets. MNIST could be a good benchmark

Some of those, like Rocchio, don't have any "easy" parameters set up yet - so you wouldn't expect to see a change. That's why the method returns the number of parameters added.

Other negatives could be that you are just at the best accuracy and the different scores are just random chance - so not really meaningful.

I'll reply more when I get home

Sent from my iPhone

On Jul 17, 2015, at 11:52 AM, salamanders notifications@github.com wrote:

But overall - my original request for "more classifiers with a sane no-args constructor please!" like including your advice for "a DecisionTree with depth 6 isn't by any means optimal - but it isn't crazy" in all the weak learners.

— Reply to this email directly or view it on GitHub.

EdwardRaff commented 9 years ago

Errr... I'm not entirely sure. I got some that error with "Network needs categorical attributes to work" or "At least 2 categorical variables are needed for ODE" and was wondering if there was a generic way to wrap data or wrap the classifier to make these run. Like using one-vs-rest to turn a binary classifier into a multiple class classifier.

I understand your problem now. You could wrap them using the DataModelPipeline and the NominalToNumeric and NumericalToHistogram transforms to get them to work. However, I don't have a programatic way to determine if the classifier can work with numeric or categorical features.

Note, that NumericalToHistogram is currently a little brain dead. There are 2 better ways of doing it that I've been meaning to implement but never got around to because I haven't.

salamanders commented 9 years ago

I'm getting by with the following, could easily do the same with others. If I knew which. Should the others implement BinaryScoreClassifier?

    // TODO: Should only wrap if necessary.
    final Classifier model = (model1 instanceof BinaryScoreClassifier
            || model1 instanceof LogitBoost
            || model1 instanceof LogitBoostPL
            || model1 instanceof LogisticRegressionDCD) ? new OneVSAll(model1) : model1;
EdwardRaff commented 9 years ago

Hmm, so some things like LogisticRegression I didn't have implement BinaryScoreClassifier because they already produce calibrated probabilities, and the class was introduced so that methods that didn't have probabilities could be calibrated. I'm not sure it would make sense to have every binary classifier implement that interface.

I suppose the 3 ways to add such a marker would be adding a new method to the Classifier interface, creating a new "marker" interface with no methods, or creating an annotation. I'll think about it.