How possible is it to run the particle optimization in parallel?

amymmorton commented 10 months ago

Right now- particles are evaluated iteratively.

PSO generates N amount of particles(for us, 100 alterations to the initial pose 6dof) .. at each iteration, the best ncc of each particle should be evaluated-

We loop is PSO.cpp lines 69-81

float currentBest = host_fitness_function(gBest);

for (int i = 0; i < NUM_OF_PARTICLES*NUM_OF_DIMENSIONS; i++)
{
  float rp = getRandomClamped();
  float rg = getRandomClamped();

  velocities[i] = OMEGA * velocities[i] + c1 * rp*(pBests[i] - positions[i]) + c2 * rg*(gBest[i%NUM_OF_DIMENSIONS] - positions[i]);

  positions[i] += velocities[i];
}

OMEGA = OMEGA * 0.9f;

According to https://machinelearningmastery.com/a-gentle-introduction-to-particle-swarm-optimization/

Each particle's best and the whole swarm's best are used to compute velocity ** to keep in mind for #128

Also - our hyperparameters ( c1 c2 == 1.5 ) ... I'm reading Comparing_inertia_weights_and_constriction_factors_in_particle_swarm_optimization.pdf

to see if I can figure out the rationale

NicerNewerCar commented 10 months ago

I think the bottleneck is the GPU, to the best of my knowledge, calling the ncc function on the GPU is a blocking process (ie. We can only evaluate the NCC of one particle at a time). So, I doubt we would see much pick-up by parallelizing this loop, but it could still be worth an investigation.

NicerNewerCar commented 10 months ago

Did a little bit of digging and it may be faster to let OpenCLs command queue handle this bottleneck. So I can try the following:

Spawn a thread for each particle to launch the NCC kernels
- This will offload the queueing and running of all of the particles to OpenCL
Wait for all of the kernels to finish
Grab the best particle for the next iteration
- Probably a priority queue

BrownBiomechanics / Autoscoper

How possible is it to run the particle optimization in parallel? #218