SciML / DiffEqGPU.jl

GPU-acceleration routines for DifferentialEquations.jl and the broader SciML scientific machine learning ecosystem
https://docs.sciml.ai/DiffEqGPU/stable/
MIT License
274 stars 28 forks source link

Add GPU Euler-Maruyama SDE solver #208

Closed utkarsh530 closed 1 year ago

utkarsh530 commented 1 year ago
using DiffEqGPU, SimpleDiffEq, Test, StaticArrays, StochasticDiffEq, BenchmarkTools

# dX_t = u dt + dW_t
f(u, p, t) = u
g(u, p, t) = u
u0 = @SVector [0.5f0]

tspan = (0.0f0, 1.0f0)
prob = SDEProblem(f, g, u0, tspan)

Convergence Test (Red: 0.5*exp(t), dt = 1e-3, trajectories = 1000) test_sde

codecov[bot] commented 1 year ago

Codecov Report

Merging #208 (bae56f4) into master (dfb231f) will not change coverage. The diff coverage is 0.00%.

@@          Coverage Diff           @@
##           master    #208   +/-   ##
======================================
  Coverage    0.00%   0.00%           
======================================
  Files           9      10    +1     
  Lines        1990    2041   +51     
======================================
- Misses       1990    2041   +51     
Impacted Files Coverage Δ
src/DiffEqGPU.jl 0.00% <0.00%> (ø)
src/perform_step/gpu_em_perform_step.jl 0.00% <0.00%> (ø)
src/solve.jl 0.00% <0.00%> (ø)
src/integrators/integrator_utils.jl 0.00% <0.00%> (ø)

:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more

utkarsh530 commented 1 year ago

We are doing good in benchmarks even on a relatively smaller problem with fewer time-steps (Laptop GPU + CPU):

using DiffEqGPU, StaticArrays, StochasticDiffEq, BenchmarkTools

# dX_t = u dt + dW_t
f(u, p, t) = u
g(u, p, t) = u
u0 = @SVector [0.5f0]

tspan = (0.0f0, 1.0f0)
prob = SDEProblem(f, g, u0, tspan)

prob_func = (prob, i, repeat) -> prob
monteprob = EnsembleProblem(prob)

dt = Float32(1//2^(10))

@benchmark sol = solve(monteprob,EM(),EnsembleCPUArray(), dt = dt, trajectories = 10_000, adaptive = false, save_everystep = false)
# 338.686 ms

@benchmark sol = solve(monteprob,EM(),EnsembleThreads(), dt = dt, trajectories = 10_000, adaptive = false, save_everystep = false)
# 139.268 ms

@benchmark @CUDA.sync sol = solve(monteprob,EM(),EnsembleGPUArray(), dt = dt, trajectories = 10_000, adaptive = false, save_everystep = false)
# 330.102 ms

@benchmark @CUDA.sync sol = solve(monteprob,GPUEM(),EnsembleGPUKernel(), dt = dt, trajectories = 10_000, adaptive = false)
# 76.011 ms

Speed-up:

  1. EnsembleCPUArray: 4.5x
  2. EnsembleThreads: ~2x
  3. EnsembleGPUArray: 4x

@ChrisRackauckas