ICB-DCM / pyPESTO

python Parameter EStimation TOolbox
https://pypesto.readthedocs.io
BSD 3-Clause "New" or "Revised" License
216 stars 47 forks source link

All CPUs are utilized for `np.linalg.eig/h`, ndim dependent #1312

Open dilpath opened 6 months ago

dilpath commented 6 months ago

Bug description Despite using SingleCoreEngine, all CPUs were at 100% utilization.

After profiling, it looks like this is due to use of np.linalg.eig or np.linalg.eigh. For example, the default ScipyOptimizer does not have this issue. FidesOptimizer and CmaesOptimizer do have this issue.

Profiling was done with another script using pyPESTO optimization. Here's a small demonstration of the issue with np.linalg.eig/h directly.

import time

import numpy as np

def time_eig(M, repeats=5000):
    t0 = time.time()
    for _ in range(repeats):
        np.linalg.eig(M)
    print(time.time() - t0)

time_eig(np.random.rand(31, 31))
time_eig(np.random.rand(32, 32))
time_eig(np.random.rand(33, 33))

time_eig(np.random.rand(63, 63))
time_eig(np.random.rand(64, 64))
time_eig(np.random.rand(65, 65))
The results are function matrix size n_repeats total time (s) CPUs used
np.linalg.eig 31x31 2e4 9.5 1
np.linalg.eig 32x32 2e4 9.5 1
np.linalg.eig 33x33 2e4 11.4 1
np.linalg.eig 63x63 5e3 13.1 1
np.linalg.eig 64x64 5e3 13.3 1
np.linalg.eig 65x65 5e3 27.3 all (8)
np.linalg.eigh 31x31 4e4 11.0 most (<8)
np.linalg.eigh 32x32 4e4 15.0 most (<8)
np.linalg.eigh 33x33 4e4 26.1 all (8)
np.linalg.eigh 63x63 5e3 6.2 all (8)
np.linalg.eigh 64x64 5e3 9.7 all (8)
np.linalg.eigh 65x65 5e3 11.3 all (8)

np.linalg.eig seems to switch to using all CPUs when the number of parameters is >64. fides uses np.linalg.eig. np.linalg.eigh seems to gradually increase the number of CPUs used. cma uses np.linalg.eigh.

Overall, just something to keep in mind when expecting single-core behavior -- this could affect benchmarking, for example. This also affects the efficiency when parallelizing optimization, since with large problems, potentially all starts will try to use all CPUs simultaneously when computing eigenvalues/vectors.

Expected behavior Approximately one CPU should be utilized 100% in all cases, when using SingleCoreEngine.

Environment

FFroehlich commented 6 months ago

Well technically even if the process runs on a single core it could still use multiple threads via hyper-threading, but overall this is interesting and I wasn't aware of it. Good that I usually snakemake for benchmarking which externally limits the number of threads to a fixed number.

numpy recommends use of https://github.com/joblib/threadpoolctl to limit use of threads in native libraries. Probably makes sense to make this configurable similar to the n_threads attribute in AmiciObjective, but at an engine level.

dweindl commented 6 months ago

I would keep that out of pypesto. Just a comment to the respective optimizers. Those settings might affect a number of libraries, and the control is best left to the user.

dilpath commented 6 months ago

a comment to the respective optimizers

If it's made very clear to the user (e.g. a warning) then I agree, otherwise it could be an easy thing to overlook, with perhaps a big performance penalty. Limiting thread use to 1, halves the wall time for np.linalg.eig(65, 65) in my test script, for example, and this is without parallelized multi-starts. cma seems popular, and with np.linalg.eigh(33, 33) I just saw a reduced wall time by a factor of 6 when limited to 1 thread. I'm not sure why limiting to 1 thread is faster... I guess it becomes slower at very-large-dimension matrices.