Feature request: number of jobs/workers for 'classic' modelling

Pascal-H commented 4 months ago

When running 'classic' (= non-deep-learning) ML modelling approaches (XGBoost, SVM), the default seems to be that all available CPU cores.
When running this on a work station with e.g. 64 CPU cores/threads, this is a bit tricky, since other processes might be blocked.
SKLearn offers n_jobs in some cases to control the number of threads/CPU cores used. In my experience, that was also not always super reliable and even trying to limit the number of cores/threads caused some greedy consumption of all available cores, but maybe 8.3.1.4. Oversubscription: spawning too many threads describes that issue exactly.

Ideally, the maximum number of cores/threads to be used, could be passed on in [EXP] or specific to each separate [MODEL] :smiley:

bagustris commented 4 months ago

@Pascal-H

As stated in that scikit-learn documentation, we can do this at low level using OMP_NUM_THREADS controlled by openmp, which should be installed by default in modern unix/linux.

For instance, to use 4 threads you can use:

$ OMP_NUM_THREADS=4 python3 -m nkululeko.nkululeko --config tests/exp_polish_bayes.ini

felixbur commented 4 months ago

nonetheless it's an easy convienient addition and i implemented it in v88.12

I united this with the already existing parameter

num_jobs

so now only n_jobs exists

felixbur commented 4 months ago

done

Pascal-H commented 3 months ago

Ah perfect, yeah, this seems to work exactly as expected :sunglasses:
Interestingly, the step that seems to consume the most CPU cores, seems to be the feature extraction with openSMILE in my test case. But also there, using

[MODEL]
n_jobs = 10

seems to be working as expected.

Would it potentially make sense to pull that n_jobs argument to [EXP] or is it only implemented for some SKLearn handle that also takes care for feature extraction?

felixbur / nkululeko

Feature request: number of jobs/workers for 'classic' modelling #149