Feature request: support for GPU BLAS libraries

iskandr commented 8 years ago

This might be out of scope for cvxpy since it might imply writing a new solver, but it seems wasteful these days to not use GPUs for basic linear algebra operations.

poulson commented 8 years ago

I definitely have plans for Elemental to support GPU BLAS libraries (for a brief period of time in 2010, it did). And Elemental has an SOCP solver exposed to CVXPY, so I would be happy to help anyone interested in working towards this.

sschnug commented 8 years ago

This really seems to be a feature, which you should propose to solver-developers. I can't imagine cvxpy will target the implementation of an own solver in the near future (or at all).

The second thing is: do GPU's really help? While i'm not an expert on implementation of solvers for mathematical programming nor gpu-based algebra at all, i'm quite sceptical.

While i did see that there are libraries for sparse algebra, i think sparseness may hurt gpu-based operations.
Another point which might hurt: latency of preparation of data + transfer to gpu + calculate + transfer back. This is highly dependent on how often small/big computations are needed within a solver and i can't give any information about that, but i can imagine, that this might be a problem too.
A different perspective: assume GPU's help: why is there no support in commercial solvers like Gurobi and Cplex? One might think that they have enough money / experts to work on this. Two random quotes from two top-tier engineers of Gurobi/Cplex on the topic:
- Gurobi
- Cplex

Maybe it would be an interesting academic-project to incorporate GPUs, but regarding the term "wasteful not to use GPUs": this sounds maybe too optimistic regarding performance gains. But that's only my opinion! Maybe @poulson can tell us more about his GPU-experiences (which he mentioned in the post above).

poulson commented 8 years ago

@sschnug Hiring in the intersection of architectural and mathematical knowledge is notoriously hard, so I wouldn't give too much weight to the argument that commercial companies would have already added support if it was beneficial.

Also, a more scientific question would be in which cases GPU's could lead to a speedup, not whether such problems exist. As the amount of work in the KKT factorization approaches n^3, the cost of the sparse-direct factorization dominates (and this, in turn, would be dominated by the cost of dense factorizations of fronts, which are well-known to be amenable to GPUs).

A more realistic (and weasle-worded) conclusion would be that some input matrices benefit from exploiting GPUs within IPMs.

echu commented 8 years ago

I second @poulson's conclusion. There is possibly some benefit to using GPUs, but a GPU-based IPM wouldn't necessarily be a net win for all problems.

IPMs (in general) require the solution of several a sparse, indefinite linear systems. When the IPM is near termination, the linear system becomes horrendously ill-conditioned. There are workarounds and folks have come up with interesting ways to use GPUs to solve IPMs (see, Gondzio, http://www.maths.ed.ac.uk/~gondzio/reports/mtxFree.html), but haven't been a net win across the board.

To add to the wrinkle, there's more than one way to solve the indefinite linear system. Most of the work has focused on sparsity-preserving factorizations (SeDuMi, for instance, has several heuristics to remove dense rows and columns and reintroduce them; ECOS regularizes the linear system so it's quasidefinite, etc.). To the best of my knowledge, we don't actually know which approach is best (fastest, most stable, etc.) a priori. Furthermore, the GPU's architecture is very different from a CPU's architecture and any conclusions we draw about KKT solvers on a CPU may not apply to a GPU.

@sschnug's second point about the data transfer is also correct, depending on how you implement the IPM, you might have to transfer data to the GPU and pay the transfer overhead every iteration. If you don't provide the GPU with enough work, then the data transfer overhead dominates and your GPU sits idle most of the time. Simply swapping out the BLAS library with a CUDA BLAS won't necessarily get you there for this reason.

If you don't want your GPU sitting idle, you'd have to run the entire IPM algorithm on it.

All that said, nobody's built an open-source GPU solver (for all I know, Google + Baidu might have), but that doesn't mean it shouldn't be done. It just requires finding someone who understands IPMs and computer architecture and, as Jack said, it's notoriously hard to find these people.

CVXPY would be perfect for A/B-testing solvers, and it is a pity that we haven't written a GPU solver yet. You could always pick the solver that worked "best" for your use case.

I'd love to do this, but my time is otherwise occupied. You might try reaching out to @foges (https://github.com/foges/pogs) who's written (non IPM) GPU solvers before.

Personally, I think that @poulson's Elemental implementation of a cone solver may come closest to answering some of these questions once Elemental has GPU support (again). ;-)

bodono commented 8 years ago

SCS (one of the default CVXPY solvers) has GPU support now, although I haven't updated the python interface yet so it's currently unavailable via CVXPY. SCS is a first-order solver that only requires matrix multiplies with the data matrix (which is unchanging iteration to iteration), so you pay for the data transfer once and use it on the GPU after that. It uses the CUDA cuSparse API for sparse matrix multiplies.

For large enough problems I have seen more than 10x speed-up with the GPU over single-threaded CPU runs (although the problems do have to be quite large to make a difference, since the overhead of the data transfer is quite high).

bkj commented 6 years ago

Hi All --

Wondering if there's been any changes on this front since 2015. GPUs/GPU support has come quite a ways since then. It looks like you can run SCS on the GPU from CVXPY by passing gpu=True to .solve. Do any of the other optimizers have GPU support these days?

Thanks Ben

prateeky2806 commented 6 years ago

Hello Everyone, I am currently using SCS to solve an SDP using GPU. I have multiple GPU's but it only uses one of them, I was curious if there is some way to use multiple GPU's.

Thanks Prateek

cvxpy / cvxpy

Feature request: support for GPU BLAS libraries #245