Update ParallelSyncPSO to use Shared Memory

utkarsh530 commented 7 months ago

Checklist

[ ] Appropriate tests were added
[ ] Any code changes were done in a way that does not break public API
[ ] All documentation related to code changes were updated
[ ] The new code follows the contributor guidelines, in particular the SciML Style Guide and COLPRAC.
[ ] Any new documentation only uses public API

Additional context

Add any other context about the problem here.

utkarsh530 commented 7 months ago

Slight increase in perf:

Before:

julia> sol = solve(prob,
           ParallelSyncPSOKernel(1024, backend = CUDA.CUDABackend()),
           maxiters = 100)
retcode: Default
u: 3-element SVector{3, Float32} with indices SOneTo(3):
 1.0000638
 1.0001391
 1.0002795

julia> sol.stats
Optimization.OptimizationStats(0, 0.0126565005244333, 0, 0, 0)

After:

julia> sol = solve(prob,
           ParallelSyncPSOKernel(1024, backend = CUDA.CUDABackend()),
           maxiters = 100)
retcode: Default
u: 3-element SVector{3, Float32} with indices SOneTo(3):
 1.0000638
 1.0001391
 1.0002795

julia> sol.stats
Optimization.OptimizationStats(0, 0.0067718994140625, 0, 0, 0)

Also better scaling with no. of particles as minimum is calculated over best in blocks and better work distribution/thread

utkarsh530 commented 7 months ago

@jpsamaroo Do you know how to get tuned launch parameters from KernelAbstractions.jl?

SciML / PSOGPU.jl

Update ParallelSyncPSO to use Shared Memory #39

Checklist

Additional context