Closed utkarsh530 closed 1 year ago
Merging #208 (bae56f4) into master (dfb231f) will not change coverage. The diff coverage is
0.00%
.
@@ Coverage Diff @@
## master #208 +/- ##
======================================
Coverage 0.00% 0.00%
======================================
Files 9 10 +1
Lines 1990 2041 +51
======================================
- Misses 1990 2041 +51
Impacted Files | Coverage Δ | |
---|---|---|
src/DiffEqGPU.jl | 0.00% <0.00%> (ø) |
|
src/perform_step/gpu_em_perform_step.jl | 0.00% <0.00%> (ø) |
|
src/solve.jl | 0.00% <0.00%> (ø) |
|
src/integrators/integrator_utils.jl | 0.00% <0.00%> (ø) |
:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more
We are doing good in benchmarks even on a relatively smaller problem with fewer time-steps (Laptop GPU + CPU):
using DiffEqGPU, StaticArrays, StochasticDiffEq, BenchmarkTools
# dX_t = u dt + dW_t
f(u, p, t) = u
g(u, p, t) = u
u0 = @SVector [0.5f0]
tspan = (0.0f0, 1.0f0)
prob = SDEProblem(f, g, u0, tspan)
prob_func = (prob, i, repeat) -> prob
monteprob = EnsembleProblem(prob)
dt = Float32(1//2^(10))
@benchmark sol = solve(monteprob,EM(),EnsembleCPUArray(), dt = dt, trajectories = 10_000, adaptive = false, save_everystep = false)
# 338.686 ms
@benchmark sol = solve(monteprob,EM(),EnsembleThreads(), dt = dt, trajectories = 10_000, adaptive = false, save_everystep = false)
# 139.268 ms
@benchmark @CUDA.sync sol = solve(monteprob,EM(),EnsembleGPUArray(), dt = dt, trajectories = 10_000, adaptive = false, save_everystep = false)
# 330.102 ms
@benchmark @CUDA.sync sol = solve(monteprob,GPUEM(),EnsembleGPUKernel(), dt = dt, trajectories = 10_000, adaptive = false)
# 76.011 ms
Speed-up:
EnsembleCPUArray
: 4.5xEnsembleThreads
: ~2xEnsembleGPUArray
: 4x@ChrisRackauckas
Convergence Test (Red:![test_sde](https://user-images.githubusercontent.com/37050056/208218510-941b9632-41ce-4fdd-b39a-4f872f8dfd0b.png)
0.5*exp(t), dt = 1e-3, trajectories = 1000
)