All CPUs are utilized for `np.linalg.eig/h`, ndim dependent

dilpath commented 6 months ago

Bug description Despite using SingleCoreEngine, all CPUs were at 100% utilization.

After profiling, it looks like this is due to use of np.linalg.eig or np.linalg.eigh. For example, the default ScipyOptimizer does not have this issue. FidesOptimizer and CmaesOptimizer do have this issue.

Profiling was done with another script using pyPESTO optimization. Here's a small demonstration of the issue with np.linalg.eig/h directly.

import time

import numpy as np

def time_eig(M, repeats=5000):
    t0 = time.time()
    for _ in range(repeats):
        np.linalg.eig(M)
    print(time.time() - t0)

time_eig(np.random.rand(31, 31))
time_eig(np.random.rand(32, 32))
time_eig(np.random.rand(33, 33))

time_eig(np.random.rand(63, 63))
time_eig(np.random.rand(64, 64))
time_eig(np.random.rand(65, 65))

The results are	function	matrix size	n_repeats	total time (s)
`np.linalg.eig`	31x31	2e4	9.5	1
`np.linalg.eig`	32x32	2e4	9.5	1
`np.linalg.eig`	33x33	2e4	11.4	1

`np.linalg.eig`	63x63	5e3	13.1	1
`np.linalg.eig`	64x64	5e3	13.3	1
`np.linalg.eig`	65x65	5e3	27.3	all (8)


`np.linalg.eigh`	31x31	4e4	11.0	most (<8)
`np.linalg.eigh`	32x32	4e4	15.0	most (<8)
`np.linalg.eigh`	33x33	4e4	26.1	all (8)

`np.linalg.eigh`	63x63	5e3	6.2	all (8)
`np.linalg.eigh`	64x64	5e3	9.7	all (8)
`np.linalg.eigh`	65x65	5e3	11.3	all (8)

np.linalg.eig seems to switch to using all CPUs when the number of parameters is >64. fides uses np.linalg.eig. np.linalg.eigh seems to gradually increase the number of CPUs used. cma uses np.linalg.eigh.

Overall, just something to keep in mind when expecting single-core behavior -- this could affect benchmarking, for example. This also affects the efficiency when parallelizing optimization, since with large problems, potentially all starts will try to use all CPUs simultaneously when computing eigenvalues/vectors.

Expected behavior Approximately one CPU should be utilized 100% in all cases, when using SingleCoreEngine.

Environment

Operating system: Ubuntu 22.04
pypesto version: current develop, with NumPy 1.24.3
Python version: 3.10.12

FFroehlich commented 6 months ago

Well technically even if the process runs on a single core it could still use multiple threads via hyper-threading, but overall this is interesting and I wasn't aware of it. Good that I usually snakemake for benchmarking which externally limits the number of threads to a fixed number.

numpy recommends use of https://github.com/joblib/threadpoolctl to limit use of threads in native libraries. Probably makes sense to make this configurable similar to the n_threads attribute in AmiciObjective, but at an engine level.

dweindl commented 6 months ago

I would keep that out of pypesto. Just a comment to the respective optimizers. Those settings might affect a number of libraries, and the control is best left to the user.

dilpath commented 6 months ago

a comment to the respective optimizers

If it's made very clear to the user (e.g. a warning) then I agree, otherwise it could be an easy thing to overlook, with perhaps a big performance penalty. Limiting thread use to 1, halves the wall time for np.linalg.eig(65, 65) in my test script, for example, and this is without parallelized multi-starts. cma seems popular, and with np.linalg.eigh(33, 33) I just saw a reduced wall time by a factor of 6 when limited to 1 thread. I'm not sure why limiting to 1 thread is faster... I guess it becomes slower at very-large-dimension matrices.

ICB-DCM / pyPESTO

All CPUs are utilized for `np.linalg.eig/h`, ndim dependent #1312