SciML / PSOGPU.jl

GPU accelerated Particle Swarm Optimization
MIT License
13 stars 1 forks source link

Implement basic GPU Particle Swarm #1

Closed utkarsh530 closed 11 months ago

utkarsh530 commented 11 months ago

Implements a GPU Particle Swarm opt, where the particle states are updated asynchronously on the GPU. However, the enforces a race condition in updating the global optimum position, which is used to calculate the updated velocity.

There are three approaches to fix this, having advantages and trade-offs:

  1. https://arxiv.org/pdf/2205.01313.pdf (Fuse all the kernels with atomics, shared memory, and thread lock)
  2. https://arxiv.org/pdf/2205.01313.pdf (Completely asynchronous, but limited by no. of particles in swarm due to fixed block size of the GPU)
  3. parallel-reduce. Three kernels, compute, calculate local optima, and calculate global optima with parallel-reduce. (More robust, but slower due to multiple kernel launches)

I also wrap ParallelPSOCPU(), which is a CPU parallelized version as implemented here: https://stackoverflow.com/questions/65342388/why-my-code-in-julia-is-getting-slower-for-higher-iteration

utkarsh530 commented 11 months ago

A sanity check:

# lb = @SArray Float32[-1.0, -1.0]

# ub = @SArray Float32[1.0, 1.0]

# rosenbrock(x, p) = (p[1] - x[1])^2 + p[2] * (x[2] - x[1]^2)^2
# x0 = @SArray Float32[-1.0,-1.0]
# p = @SArray Float32[2.0, 100.0]

include("./pso_gpu.jl")

pso_solve_cpu(prob, gbest, particles) 
# PSOGBest{SVector{2, Float32}, Float32}(Float32[0.9981992, 0.9922749], 1.0053078f0)
pso_solve_gpu(prob, gbest, gpu_particles)
# PSOGBest{SVector{2, Float32}, Float32}(Float32[0.9981992, 0.9922749], 1.0053078f0)

Benchmarking:

julia> @benchmark pso_solve_cpu($prob, $gbest, $particles)
BenchmarkTools.Trial: 23 samples with 1 evaluation.
 Range (min … max):  221.738 ms … 228.807 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     221.782 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   222.327 ms ±   1.581 ms  ┊ GC (mean ± σ):  0.00% ± 0.00%

  █                                                              
  █▁▁▁▁▁▁▁▁▁▁▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▃ ▁
  222 ms           Histogram: frequency by time          229 ms <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark pso_solve_gpu($prob, $gbest, $gpu_particles)
BenchmarkTools.Trial: 8007 samples with 1 evaluation.
 Range (min … max):  570.779 μs …  22.768 ms  ┊ GC (min … max): 0.00% … 96.91%
 Time  (median):     586.182 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   619.776 μs ± 540.303 μs  ┊ GC (mean ± σ):  4.05% ±  4.55%

       ▄██▅▃                                                     
  ▁▁▂▄███████▇▅▅▄▄▃▃▃▃▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▁▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▂
  571 μs           Histogram: frequency by time          671 μs <

 Memory estimate: 108.12 KiB, allocs estimate: 2207.
codecov[bot] commented 11 months ago

Welcome to Codecov :tada:

Once merged to your default branch, Codecov will compare your coverage reports and display the results in this comment.

Thanks for integrating Codecov - We've got you covered :open_umbrella: