Open amueller opened 11 years ago
I will compare, the original paper did some comparisons with SGD (not sklearn's implementation) and they found that the projection step and adaptive learning rate improved performance.
The SGD in scikit-learn actually has an adaptive learning rate - it can even be set to be the same as pegasos, I believe. For the projection step, the claims are much milder in the journal version of the paper and in the source code they provide it is commented out. I have not seen a careful analysis of the projection step, though, and would be quite interested in that.
After looking it up again, I think you need to set power_t=1
to get the pegasos schedule.
Here are some benchmarks with identical learning rates:
https://raw.github.com/ejlb/pegasos/master/benchmarks/benchmarks.png
Pegasos seems to be slightly more accurate (1%). The only two differences I know of are:
1) pegasos projection 2) pegasos trains on random examples so may get a better generalisation error.
Due to point 2) it is hard to compare speed across iterations.
Wow that looks quite good. I'm quite surprised your implementation is significantly faster than sklearn. Do you have any idea where that could come from? Also, could you please share your benchmark script?
cc @pprett @larsmans
You say that training on random samples makes it had to compare speed.s How so? One iteration of sgd are n_samples
many updates, which you should compare against n_samples
many updates in pegasos. Or did you compare against single updates here?
@amueller SGDClassifier trains on the whole data set at each iteration I assume? It is probably where the speed increase comes from
edit: yes true, that would be a good comparison. Will upload the benchmark script
Ok, but then the plot doesn't make sense. You should rescale it such that the number of weight updates is the same.
Yeah, will run some with equal weight updates
Yes, SGDClassifier
does
for i in xrange(n_iter):
shuffle(dataset)
for x in X:
update()
It also wastes a little bit of time in each update, checking whether it should do a PA update or a vanilla additive one.
this makes much more sense:
https://raw.github.com/ejlb/pegasos/master/benchmarks/weight_updates/benchmarks.png
Perhaps batching the pegasos weight updates would retain the slight accuracy boost and improve the training time
Yeah, that looks more realistic ;) How did you set alpha and did you set eta0 in the SGD?
I used this: SGDClassifier(power_t=1, learning_rate='invscaling', n_iter=sample_coef, eta0=0.01). The full benchmark is here: https://github.com/ejlb/pegasos/blob/master/benchmarks/weight_updates/benchmark.py
Hey. Did you compare with SGDClassifier? The results should be quite close to yours.