Open amymmorton opened 10 months ago
I think the bottleneck is the GPU, to the best of my knowledge, calling the ncc
function on the GPU is a blocking process (ie. We can only evaluate the NCC of one particle at a time). So, I doubt we would see much pick-up by parallelizing this loop, but it could still be worth an investigation.
Did a little bit of digging and it may be faster to let OpenCLs command queue handle this bottleneck. So I can try the following:
Right now- particles are evaluated iteratively.
PSO generates N amount of particles(for us, 100 alterations to the initial pose 6dof) .. at each iteration, the best ncc of each particle should be evaluated-
We loop is PSO.cpp lines 69-81
According to https://machinelearningmastery.com/a-gentle-introduction-to-particle-swarm-optimization/
Each particle's best and the whole swarm's best are used to compute velocity ** to keep in mind for #128
Also - our hyperparameters ( c1 c2 == 1.5 ) ... I'm reading Comparing_inertia_weights_and_constriction_factors_in_particle_swarm_optimization.pdf
to see if I can figure out the rationale