cvxgrp / scs

Splitting Conic Solver
MIT License
553 stars 136 forks source link

Single core problem-dependant performances ? #253

Open b-grimaud opened 1 year ago

b-grimaud commented 1 year ago

Specifications

Hi, I've been going through the documentation and some old issues to look for details on multithreading with SCS. From what I could see, there's no explicit support for it unlike some other solvers, but this old comment mentions the possibility of building from source with OpenMP to make use of multiple cores.

There's also this StackOverflow comment that claims that half of the cores are being used with the default acceleration_lookback setting. I've tried bumping up that number up to 100 on my own code, but the problem was still being solved on a single core. Lowering that number to 1 did slow it down though. Does that mean that multicore calculation is not available for all problems ?

I gave a quick shot to building from source with OpenMP, which failed most likely because of a compiler issue, but if it's so far the easiest way to speed up solving overall I will definetly try to make it work.

I also tried parallelizing the same problem over multiple sources of data with joblib, to no avail.

bodono commented 1 year ago

There are five places where multi-threading can speed up SCS:

  1. In the matrix multiplies, primarily for the indirect solver but also to some small extent the direct solver - if compiled with USE_OPENMP = 1 .
  2. In the projection onto the exponential cone - if compiled with USE_OPENMP = 1.
  3. In the projection onto the semidefinite cone - if your lapack library is multi-threaded.
  4. In the Anderson acceleration - if your lapack library is multi-threaded.
  5. If using MKL then multiple threads will be used for the linear system solver and for the SD cone project and Anderson acceleration.

To control the number of threads you can use OMP_NUM_THREADS environment variable (I think this should also work for the MKL threads).

kalmarek commented 1 year ago

@bodono just to confirm: SCS + OMP_NUM_THREADS + MKL works for us in the julia land ;)

b-grimaud commented 1 year ago

I tried it out on a fairly large dataset, things end up being slightly slower with MKL than without. I guess for this kind of problem it would make more sense to run each solve on a different core with different data ? Either way, thanks for the help !

Just as a note, if I install MKL with pip, it isn't recognized by setup.py when building from source with the --mkl flag. Installing with conda works just fine. This is on Ubuntu 20.04 LTS with Python 3.10.