SciML / PSOGPU.jl

GPU accelerated Particle Swarm Optimization
MIT License
13 stars 1 forks source link

Update ParallelSyncPSO to use Shared Memory #39

Closed utkarsh530 closed 7 months ago

utkarsh530 commented 7 months ago

Checklist

Additional context

Add any other context about the problem here.

utkarsh530 commented 7 months ago

Slight increase in perf:

Before:

julia> sol = solve(prob,
           ParallelSyncPSOKernel(1024, backend = CUDA.CUDABackend()),
           maxiters = 100)
retcode: Default
u: 3-element SVector{3, Float32} with indices SOneTo(3):
 1.0000638
 1.0001391
 1.0002795

julia> sol.stats
Optimization.OptimizationStats(0, 0.0126565005244333, 0, 0, 0)

After:

julia> sol = solve(prob,
           ParallelSyncPSOKernel(1024, backend = CUDA.CUDABackend()),
           maxiters = 100)
retcode: Default
u: 3-element SVector{3, Float32} with indices SOneTo(3):
 1.0000638
 1.0001391
 1.0002795

julia> sol.stats
Optimization.OptimizationStats(0, 0.0067718994140625, 0, 0, 0)

Also better scaling with no. of particles as minimum is calculated over best in blocks and better work distribution/thread

utkarsh530 commented 7 months ago

@jpsamaroo Do you know how to get tuned launch parameters from KernelAbstractions.jl?