A new accelerated, parallel, proximal descent method

dask / dask-glm

BSD 3-Clause "New" or "Revised" License

76 stars 46 forks source link

A new accelerated, parallel, proximal descent method #3

Open mcg1969 opened 7 years ago

mcg1969 commented 7 years ago

Probably not the most orthodox thing to put in a GitHub issue, but it seems like it could be helpful for this project.

In the latest SIAM Review a paper by Fercoq and Richtárik appears: Optimization in High Dimensions vis Accelerated, Parallel, Coordinate Descent. I've got a paper copy, and I know the second author, and can certainly get an electronic copy if interested. Here is a preprint.

I can vouch for these folks, they've been working for years to parallelize some of the very optimization problems we're aiming to tackle here.

mrocklin commented 7 years ago

This seems like a great thing to put in a github issue to me :)

I'll be playing a bit more with dask performance early this week, mostly around the current implementations in @moody-marlin 's no_api branch (which seems to have the most recent development). I suspect that those optimizations will apply to any such solution though.

cc @hussainsultan @moody-marlin

mcg1969 commented 7 years ago

Oh, actually, it looks like they've paid to make the PDF's free! Here you go:

mrocklin commented 7 years ago

cc @jcrist

cicdw commented 7 years ago

Interesting work; however, they are mainly focused on the situation with a "huge" number of features. I think in the GLM space it is uncommon to use more than, say, 300 features at a time, which makes me more inclined to focus on algorithms like ADMM / SGD that can distribute across training examples / groups and not across features.