Closed Pascal-H closed 4 months ago
@Pascal-H
As stated in that scikit-learn documentation, we can do this at low level using OMP_NUM_THREADS
controlled by openmp, which should be installed by default in modern unix/linux.
For instance, to use 4 threads you can use:
$ OMP_NUM_THREADS=4 python3 -m nkululeko.nkululeko --config tests/exp_polish_bayes.ini
nonetheless it's an easy convienient addition and i implemented it in v88.12
I united this with the already existing parameter
num_jobs
so now only n_jobs exists
done
Ah perfect, yeah, this seems to work exactly as expected :sunglasses:
Interestingly, the step that seems to consume the most CPU cores, seems to be the feature extraction with openSMILE in my test case. But also there, using
[MODEL]
n_jobs = 10
seems to be working as expected.
Would it potentially make sense to pull that n_jobs
argument to [EXP]
or is it only implemented for some SKLearn handle that also takes care for feature extraction?
When running 'classic' (= non-deep-learning) ML modelling approaches (XGBoost, SVM), the default seems to be that all available CPU cores.
When running this on a work station with e.g. 64 CPU cores/threads, this is a bit tricky, since other processes might be blocked.
SKLearn offers
n_jobs
in some cases to control the number of threads/CPU cores used. In my experience, that was also not always super reliable and even trying to limit the number of cores/threads caused some greedy consumption of all available cores, but maybe 8.3.1.4. Oversubscription: spawning too many threads describes that issue exactly.Ideally, the maximum number of cores/threads to be used, could be passed on in
[EXP]
or specific to each separate[MODEL]
:smiley: