Optimize for AUC directly

pprett commented 11 years ago

Investigate different classifiers that optimize AUC directly instead of some surrogate .

I'm aware of the following classifiers that support AUC optimization::

Ranking SVM (e.g. svmlight or Fabian's RankSVM based on libsvm - if size is an issue use the SGD version in sofia-ml)
RankNet (by C. Burges - implemented in RankLib)
LambdaMART (by C. Burges - a GBRT for AUC; implemented in gbm)

To blend multiple classifiers such that AUC is optimized one can look at the ROC curves of the classifiers - basically: one can obtain the convex hull of the ROC curves (and thus its AUC) of the individual models by combining the models (See Fawcett & Provost)

glouppe commented 11 years ago

This should be quite interesting in combination with input features generated from a pyxit classifier (see #2). I'll definitely try that.

pprett commented 11 years ago

I ran the logged spectrogram features through a RankSVM and compared it against an ordinary (linear) SVM - here are the results::

RankSVM:  0.94706 (0.02245)
LinearSVC: 0.93816 (0.02066)

Performance is up by 0.01 AUC - which is not much - still an improvement - RankSVM is hardly tuned (same params as LinearSVC). I used our train_small.npz.

[1] https://gist.github.com/agramfort/2071994

glouppe commented 11 years ago

Whoah! I can't believe we can get so high AUC with a linear model. This is really good news!

On my side, I have been a bit busy on something else unfortunately. I installed "rastamat" though and it seems to work. I'll generate some features for train_small.npz tomorrow morning and upload them on dropbox. (I am afraid this will also require some tuning though, since melfcc and rastaplp have quite a list of parameters.)

pprett commented 11 years ago

2013/2/18 Gilles Louppe notifications@github.com

Whoah! I can't believe we can get so high AUC with a linear model. This is really good news!

On my side, I have been a bit busy on something else unfortunately. I installed "rastamat" though and it seems to work. I'll generate some features for train_small.npz tomorrow morning and upload them on dropbox. (I am afraid this will also require some tuning though, since melfcc and rastaplp have quite a list of parameters.)

— Reply to this email directly or view it on GitHubhttps://github.com/glouppe/whale-challenge/issues/3#issuecomment-13742491.

Just wrapped your stats code in a transformer object and stacked those features with the logged spectrograms - now the LinearSVC is up to::

spectrogram: 0.95250 (0.02810)

Tuning is a bit tricky though because the stats features require a different value of C compared to the raw spectrogram features...

Peter Prettenhofer

pprett commented 11 years ago

did I already tell you that I hate tuning svms... I think I'll better continue tomorrow

2013/2/18 Peter Prettenhofer peter.prettenhofer@gmail.com

2013/2/18 Gilles Louppe notifications@github.com

Whoah! I can't believe we can get so high AUC with a linear model. This is really good news!

On my side, I have been a bit busy on something else unfortunately. I installed "rastamat" though and it seems to work. I'll generate some features for train_small.npz tomorrow morning and upload them on dropbox. (I am afraid this will also require some tuning though, since melfcc and rastaplp have quite a list of parameters.)

— Reply to this email directly or view it on GitHubhttps://github.com/glouppe/whale-challenge/issues/3#issuecomment-13742491.

Just wrapped your stats code in a transformer object and stacked those features with the logged spectrograms - now the LinearSVC is up to::
spectrogram: 0.95250 (0.02810)
Tuning is a bit tricky though because the stats features require a different value of C compared to the raw spectrogram features...

Peter Prettenhofer

Peter Prettenhofer

glouppe / kaggle-marinexplore

Optimize for AUC directly #3