I've build a job that calculates recommendations (AlternatingLeastSquares, CPU version)
On my host machine it takes ~8 minutes to finish.
In my K8S cluster it takes ~100 minutes to finish.
The resources are comparable. In both cases native extensions are used.
The only problem I can see is that the job consumes all available cores in K8S. And here comes throttling (cpu_limits=20).
How I can restrict the job to a specific number of cores? Do I have to play with OpenMP settings or something? Setting NUM_THREADS didn't work out, the job still consumes all available cores.
I believe that's the reason of such enormous degradation.
Hi there!
I've build a job that calculates recommendations (
AlternatingLeastSquares
, CPU version) On my host machine it takes ~8 minutes to finish. In my K8S cluster it takes ~100 minutes to finish. The resources are comparable. In both cases native extensions are used.The only problem I can see is that the job consumes all available cores in K8S. And here comes throttling (cpu_limits=20).
How I can restrict the job to a specific number of cores? Do I have to play with OpenMP settings or something? Setting NUM_THREADS didn't work out, the job still consumes all available cores.
I believe that's the reason of such enormous degradation.
The line that is taking so long is this:
There are a few million users