SciML / DiffEqGPU.jl

GPU-acceleration routines for DifferentialEquations.jl and the broader SciML scientific machine learning ecosystem
https://docs.sciml.ai/DiffEqGPU/stable/
MIT License
284 stars 29 forks source link

Support for ODE with Matrix-Vector multiplication #183

Closed albertomercurio closed 1 year ago

albertomercurio commented 1 year ago

Hello,

I need to use matrix-vector multiplication inside my ODE problem. This is often used in quantum physics.

The simplest example should be

using LinearAlgebra
using SparseArrays
using DifferentialEquations
using DiffEqGPU

A = sprand(ComplexF32, 100, 100, 0.1) # Also dense should be ok. But better if sparse.
A += A'
A = -1im * A

t_l = collect(LinRange{Float32}(0, 10, 100))
tspan = (t_l[1], t_l[end])

function dudt!(du,u,p,t)
    A = p[1]
    mul!(du, A, u)
    nothing
end

u0 = normalize(rand(ComplexF32, N))
p = [A]
prob = ODEProblem(dudt!,u0,tspan,p)
prob_func = (prob,i,repeat) -> remake(prob,u0=normalize(rand(ComplexF32, N)))
monteprob = EnsembleProblem(prob, prob_func = prob_func, safetycopy=false)
@time sol = solve(monteprob, Tsit5(), EnsembleGPUArray(), trajectories=10, saveat=t_l)

But it doesn't work. The only way to make it work is by defining the matrix-vector multiplication from myslelf, and using only the matrix A as parameter

function dudt!(du,u,p,t)
    A = p
    @inbounds begin
        for i in 1:size(A, 1)
            sum = 0
            for j in 1:size(A, 2)
                sum += A[i, j] * u[j]
            end
            du[i] = sum
        end
    end
    nothing
end

p = A

But it is very slow, and I need to introduce other parameters in the future.

I was told that using modelingtoolkitize could be the correct way. But I it seems that it considers the matrix A as a scalar parameter.

ChrisRackauckas commented 1 year ago

This seems a little bit confused in application. There's two forms of GPU acceleration: one is for big systems that are structured (PDEs) and one is for lots of trajectories. If you have a big system that is structured, then you make u0 a CuArray and just do a matrix-vector product. This is just then the GPU-based matrix-vector product of two GPU arrays.

The other form of GPU acceleration is the trajectory-based ensemble approaches. This has limitations and thus needs to avoid BLAS (hence the issue here), but

But it is very slow, and I need to introduce other parameters in the future.

of course, because when doing the trajectory-based ensemble you want to fill the GPU with enough trajectories. 10 trajectories is far too small for this approach: that fits into a CPU SIMD vector 😅.

Closing as it just seems confused as to what the utility of GPUs is. We'll have new docs up pretty soon to explain this in more detail.

albertomercurio commented 1 year ago

I am aware. However, I don't have any benefit to making u0 a CuArray since it is relatively too small to see the improvements of the GPU.

This is a widespread problem for studying open quantum systems using "Quantum Trajectories". This method is an alternative to the "Liouvilian dynamics", which however needs very large sparse matrices. Thus, quantum trajectories have the advantage of using relatively small matrices (more or less 100x100), and the disadvantage to do multiple trajectories.

I thought that this package could improve the Quantum Trajectories approach, making the parallelization of the trajectories directly in the GPU, but maybe it is too early.

ChrisRackauckas commented 1 year ago

Try the newer kernel generation methods. They are over 100x faster than the EnsembleGPUArray approach and the cutoff for the number of trajectories to get a speedup is much lower (since more time is spent in a single kernel).

albertomercurio commented 1 year ago

What kernel generation methods are you referring to?

ChrisRackauckas commented 1 year ago

EnsembleGPUKernel