In other chunks of work split into thread pools, the work is split up more finely than the number of CPUs. This makes it so that if a thread gets behind (in a big.little architecture, this is expected, or in the case of preemption), the threads that are ahead can pick up the extra work.
gradient_clusters was missing this split. This saved 5-10ms of processing time on my Rock Pi 4b and reduced the outliers.
In other chunks of work split into thread pools, the work is split up more finely than the number of CPUs. This makes it so that if a thread gets behind (in a big.little architecture, this is expected, or in the case of preemption), the threads that are ahead can pick up the extra work.
gradient_clusters was missing this split. This saved 5-10ms of processing time on my Rock Pi 4b and reduced the outliers.