running example population-test with GPU fails.

russelljjarvis commented 2 years ago

Hi there,

To test if I could simulate spiking neural networks with GPU I modified the population-test.jl file (now called population-gpu-test.jl)

To make the example a bit less trivial I made the neuron population 100 by creating a square neuron weight matrix as such:

weights = rand(Uniform(-2,1),100,100)

https://github.com/russelljjarvis/SpikingNN.jl/blob/master/examples/population-gpu-test.jl#L12

I also tried to make the neuron activity a bit more balanced by distributing the inputs so only 1/3 of inputs are strong.

input1 = [ (t; dt) -> 0 for i in 1:voltage_array_size/3]
input2 = [ n2synapse for i in voltage_array_size/3+1:2*voltage_array_size/3]
input3 = [ (t; dt) -> 0 for i in 2*voltage_array_size/3:voltage_array_size]

https://github.com/russelljjarvis/SpikingNN.jl/blob/master/examples/population-gpu-test.jl#L35-L37

All of these modifications work if I use:

pop = cpu(pop)

on line 16 but they break if I use

pop = gpu(pop)

instead.

See the stack trace below.

include("population-test.jl")
[ Info: Precompiling SpikingNN [a2976702-bddd-11e9-29f3-e11e525b718e]
┌ Warning: Package SpikingNN does not have Distributed in its dependencies:
│ - If you have SpikingNN checked out for development and have
│   added Distributed as a dependency but haven't updated your primary
│   environment's manifest file, try `Pkg.resolve()`.
│ - Otherwise you may need to report an issue with SpikingNN
└ Loading Distributed into SpikingNN from project dependency, future warnings for SpikingNN are suppressed.
hit GPU
┌ Warning: Performing scalar operations on GPU arrays: This is very slow, consider disallowing these operations with `allowscalar(false)`
└ @ GPUArrays ~/.julia/packages/GPUArrays/Z5nPF/src/host/indexing.jl:64
ERROR: LoadError: GPU compilation of kernel broadcast_kernel(CUDA.CuKernelContext, CuDeviceArray{Float32,1,1}, Base.Broadcast.Broadcasted{Nothing,Tuple{Base.OneTo{Int64}},typeof(+),Tuple{Base.Broadcast.Extruded{CuDeviceArray{Float32,1,1},Tuple{Bool},Tuple{Int64}},Base.Broadcast.Extruded{Array{Real,1},Tuple{Bool},Tuple{Int64}}}}, Int64) failed
KernelError: passing and using non-bitstype argument

Argument 4 to your kernel function is of type Base.Broadcast.Broadcasted{Nothing,Tuple{Base.OneTo{Int64}},typeof(+),Tuple{Base.Broadcast.Extruded{CuDeviceArray{Float32,1,1},Tuple{Bool},Tuple{Int64}},Base.Broadcast.Extruded{Array{Real,1},Tuple{Bool},Tuple{Int64}}}}, which is not isbits:
  .args is of type Tuple{Base.Broadcast.Extruded{CuDeviceArray{Float32,1,1},Tuple{Bool},Tuple{Int64}},Base.Broadcast.Extruded{Array{Real,1},Tuple{Bool},Tuple{Int64}}} which is not isbits.
    .2 is of type Base.Broadcast.Extruded{Array{Real,1},Tuple{Bool},Tuple{Int64}} which is not isbits.
      .x is of type Array{Real,1} which is not isbits.

Stacktrace:
 [1] check_invocation(::GPUCompiler.CompilerJob, ::LLVM.Function) at /home/rudolph/.julia/packages/GPUCompiler/uTpNx/src/validation.jl:68
 [2] macro expansion at /home/rudolph/.julia/packages/GPUCompiler/uTpNx/src/driver.jl:238 [inlined]
 [3] macro expansion at /home/rudolph/.julia/packages/TimerOutputs/ZmKD7/src/TimerOutput.jl:206 [inlined]
 [4] codegen(::Symbol, ::GPUCompiler.CompilerJob; libraries::Bool, deferred_codegen::Bool, optimize::Bool, strip::Bool, validate::Bool, only_entry::Bool) at /home/rudolph/.julia/packages/GPUCompiler/uTpNx/src/driver.jl:237
 [5] compile(::Symbol, ::GPUCompiler.CompilerJob; libraries::Bool, deferred_codegen::Bool, optimize::Bool, strip::Bool, validate::Bool, only_entry::Bool) at /home/rudolph/.julia/packages/GPUCompiler/uTpNx/src/driver.jl:39
 [6] compile at /home/rudolph/.julia/packages/GPUCompiler/uTpNx/src/driver.jl:35 [inlined]
 [7] cufunction_compile(::GPUCompiler.FunctionSpec; kwargs::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}) at /home/rudolph/.julia/packages/CUDA/mbPFj/src/compiler/execution.jl:302
 [8] cufunction_compile(::GPUCompiler.FunctionSpec) at /home/rudolph/.julia/packages/CUDA/mbPFj/src/compiler/execution.jl:297
 [9] check_cache(::Dict{UInt64,Any}, ::Any, ::Any, ::GPUCompiler.FunctionSpec{GPUArrays.var"#broadcast_kernel#16",Tuple{CUDA.CuKernelContext,CuDeviceArray{Float32,1,1},Base.Broadcast.Broadcasted{Nothing,Tuple{Base.OneTo{Int64}},typeof(+),Tuple{Base.Broadcast.Extruded{CuDeviceArray{Float32,1,1},Tuple{Bool},Tuple{Int64}},Base.Broadcast.Extruded{Array{Real,1},Tuple{Bool},Tuple{Int64}}}},Int64}}, ::UInt64; kwargs::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}) at /home/rudolph/.julia/packages/GPUCompiler/uTpNx/src/cache.jl:40
 [10] broadcast_kernel at /home/rudolph/.julia/packages/GPUArrays/Z5nPF/src/host/broadcast.jl:57 [inlined]
 [11] cached_compilation at /home/rudolph/.julia/packages/GPUCompiler/uTpNx/src/cache.jl:65 [inlined]
 [12] cufunction(::GPUArrays.var"#broadcast_kernel#16", ::Type{Tuple{CUDA.CuKernelContext,CuDeviceArray{Float32,1,1},Base.Broadcast.Broadcasted{Nothing,Tuple{Base.OneTo{Int64}},typeof(+),Tuple{Base.Broadcast.Extruded{CuDeviceArray{Float32,1,1},Tuple{Bool},Tuple{Int64}},Base.Broadcast.Extruded{Array{Real,1},Tuple{Bool},Tuple{Int64}}}},Int64}}; name::Nothing, kwargs::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}) at /home/rudolph/.julia/packages/CUDA/mbPFj/src/compiler/execution.jl:289
 [13] cufunction at /home/rudolph/.julia/packages/CUDA/mbPFj/src/compiler/execution.jl:286 [inlined]
 [14] macro expansion at /home/rudolph/.julia/packages/CUDA/mbPFj/src/compiler/execution.jl:100 [inlined]
 [15] #launch_heuristic#857 at /home/rudolph/.julia/packages/CUDA/mbPFj/src/gpuarrays.jl:17 [inlined]
 [16] launch_heuristic at /home/rudolph/.julia/packages/CUDA/mbPFj/src/gpuarrays.jl:17 [inlined]
 [17] copyto! at /home/rudolph/.julia/packages/GPUArrays/Z5nPF/src/host/broadcast.jl:63 [inlined]
 [18] copyto! at ./broadcast.jl:886 [inlined]
 [19] materialize! at ./broadcast.jl:848 [inlined]
 [20] materialize! at ./broadcast.jl:845 [inlined]
 [21] excite!(::StructArrays.StructArray{LIF{Float32,Int64},1,NamedTuple{(:voltage, :current, :lastt, :τm, :vreset, :R),Tuple{CuArray{Float32,1},CuArray{Float32,1},CuArray{Int64,1},CuArray{Float32,1},CuArray{Float32,1},CuArray{Float32,1}}},Int64}, ::Array{Real,1}) at /home/rudolph/git/SpikingNN.jl/src/models/lif.jl:55
 [22] excite!(::StructArrays.StructArray{Soma{LIF{Float32,Int64},SpikingNN.Threshold.Ideal{Float64}},1,NamedTuple{(:body, :threshold),Tuple{StructArrays.StructArray{LIF{Float32,Int64},1,NamedTuple{(:voltage, :current, :lastt, :τm, :vreset, :R),Tuple{CuArray{Float32,1},CuArray{Float32,1},CuArray{Int64,1},CuArray{Float32,1},CuArray{Float32,1},CuArray{Float32,1}}},Int64},StructArrays.StructArray{SpikingNN.Threshold.Ideal{Float64},1,NamedTuple{(:vth,),Tuple{CuArray{Float64,1}}},Int64}}},Int64}, ::Array{Real,1}) at /home/rudolph/git/SpikingNN.jl/src/neuron.jl:27
 [23] evaluate!(::Population{Soma{LIF{Float32,Int64},SpikingNN.Threshold.Ideal{Float64}},StructArrays.StructArray{Soma{LIF{Float32,Int64},SpikingNN.Threshold.Ideal{Float64}},1,NamedTuple{(:body, :threshold),Tuple{StructArrays.StructArray{LIF{Float32,Int64},1,NamedTuple{(:voltage, :current, :lastt, :τm, :vreset, :R),Tuple{CuArray{Float32,1},CuArray{Float32,1},CuArray{Int64,1},CuArray{Float32,1},CuArray{Float32,1},CuArray{Float32,1}}},Int64},StructArrays.StructArray{SpikingNN.Threshold.Ideal{Float64},1,NamedTuple{(:vth,),Tuple{CuArray{Float64,1}}},Int64}}},Int64},CuArray{Float64,2},StructArrays.StructArray{SpikingNN.Synapse.Alpha{Int64,Float32},2,NamedTuple{(:lastspike, :q, :τ),Tuple{CuArray{Float32,2},CuArray{Float32,2},CuArray{Float32,2}}},Int64},George}, ::Int64; dt::Float64, dense::Bool, inputs::Array{Any,1}) at /home/rudolph/git/SpikingNN.jl/src/population.jl:92
 [24] simulate!(::Population{Soma{LIF{Float32,Int64},SpikingNN.Threshold.Ideal{Float64}},StructArrays.StructArray{Soma{LIF{Float32,Int64},SpikingNN.Threshold.Ideal{Float64}},1,NamedTuple{(:body, :threshold),Tuple{StructArrays.StructArray{LIF{Float32,Int64},1,NamedTuple{(:voltage, :current, :lastt, :τm, :vreset, :R),Tuple{CuArray{Float32,1},CuArray{Float32,1},CuArray{Int64,1},CuArray{Float32,1},CuArray{Float32,1},CuArray{Float32,1}}},Int64},StructArrays.StructArray{SpikingNN.Threshold.Ideal{Float64},1,NamedTuple{(:vth,),Tuple{CuArray{Float64,1}}},Int64}}},Int64},CuArray{Float64,2},StructArrays.StructArray{SpikingNN.Synapse.Alpha{Int64,Float32},2,NamedTuple{(:lastspike, :q, :τ),Tuple{CuArray{Float32,2},CuArray{Float32,2},CuArray{Float32,2}}},Int64},George}, ::Int64; dt::Float64, cb::var"#12#13", dense::Bool, inputs::Array{Any,1}) at /home/rudolph/git/SpikingNN.jl/src/population.jl:167
 [25] top-level scope at /home/rudolph/git/SpikingNN.jl/examples/population-test.jl:75
 [26] include(::String) at ./client.jl:457
 [27] top-level scope at REPL[1]:1
in expression starting at /home/rudolph/git/SpikingNN.jl/examples/population-test.jl:75

darsnack commented 2 years ago

I'll take a look at this sometime this week. Did you use a released version of the package or the master branch? This repo is mid-refactor so I haven't done the due diligence of making sure every component is working.

russelljjarvis commented 2 years ago

Hi @darsnack, sorry I didn't see your reply straight away. I used installed via the git-master branch. Should I try the bundled package instead?

I also noticed that a method signature for lif neuron model that my example looks like this:

function lif!(t::CuVector{<:Real}, I::CuVector{<:Real}, V::CuVector{<:Real}; vrest::CuVector{<:Real}, R::CuVector{<:Real}, tau::CuVector{<:Real})

https://github.com/darsnack/SpikingNN.jl/blob/master/src/models/lif.jl#L100

In my example I have not actually supplied I and V as CuVectors. Perhaps if I did that, it would help. I am not sure about all the other parameters however. It would be nice if they were all converted automatically and perhaps that is the intention.

https://github.com/russelljjarvis/SpikingNN.jl/blob/master/examples/population-gpu-test.jl#L12

Also I should have clarified that CUDA.functional()==True on the machine I am using.

Since using master I noticed a branch called refactor/gpu-support and another branch called benchmarking. I also noticed the benchmarking branch has the classic Brunel model, which is cool.

Its a great repository by the way, it potentially has a good balance of features and examples. I am also using WaspNet.jl and SpikingNeuralNetworks.jl. I can't yet figure out which is the best Spiking Neural Networks Package. I am busy optimizing SNNs with genetic algorithms in julia using the ISI spike distance of the raster-plots and I might end up making example optimizations that involve all or any of three simulators.

darsnack commented 2 years ago

Cool, glad to see someone trying this code out in a different use case than mine.

The branch that I'm currently using for my research is kd/refactor. Unfortunately, it isn't cleaned up, and I have un-pushed commits. I've lost track of this repo as I've been pulled away to non-SNN stuff. Your timing is pretty good though, since I'm resuming my SNN project. I plan on cleaning up this repo this week. I'll ping this issue when I have it done.

russelljjarvis commented 2 years ago

Srm0-test.jl runs with GPU if you make the following modifications: The start of src/gpu.jl needs to import from the CUDA module (compatible with CUDA11 driver) not CuArrays. The CUDA module has a constructor CuArray which presumably functions the same as CuArrays.x(x) First:

using CUDA
CUDA.allowscalar(false)
cpu(x) = x
gpu(x) = x
cpu(x::CuArray) = adapt(Array, x)
gpu(x::Array) = CuArray(x)

Next, in the file tests/Srm0-test.jl wrap every SpikingNN element with the gpu function as appropriate.

using SpikingNN
using Plots

# SRM0 params
η₀ = 5.0
τᵣ = 1.0
vth = 0.5

# Input spike train params
rate = 0.01
T = 15
∂t = 0.01
n = convert(Int, ceil(T / ∂t))

srm = gpu(Neuron(QueuedSynapse(Synapse.Alpha()), SRM0(η₀, τᵣ), Threshold.Ideal(vth)))
input = gpu(ConstantRate(rate))
spikes = excite!(srm, input, n)

# callback to record voltages
voltages = gpu(Float64[])
record = function ()
    push!(voltages, getvoltage(srm))
end

# simulate
@time output = simulate!(srm, n; dt = ∂t, cb = record, dense = true)

# plot raster plot
raster_plot = rasterplot(∂t .* spikes, ∂t .* output, label = ["Input", "Output"], xlabel = "Time (sec)",
                title = "Raster Plot (\\alpha response)")
xlims!(0, T)

# plot dense voltage recording
plot(∂t .* collect(1:n), voltages,
    title = "SRM Membrane Potential with Varying Presynaptic Responses", xlabel = "Time (sec)", ylabel = "Potential (V)", label = "\\alpha response")

# resimulate using presynaptic response
voltages = gpu(Float64[])
srm = gpu(Neuron(QueuedSynapse(Synapse.EPSP(ϵ₀ = 2, τm = 0.5, τs = 1)), SRM0(η₀, τᵣ), Threshold.Ideal(vth)))
excite!(srm, spikes)
@time simulate!(srm, n; dt = ∂t, cb = record, dense = true)

# plot voltages with response function
voltage_plot = plot!(∂t .* collect(1:n), voltages, label = "EPSP response")
xlims!(0, T)

plot(raster_plot, voltage_plot, layout = grid(2, 1))
xticks!(0:T)

This code executes using Currays. Note no networks are simulated here. Network simulation breaks due to an array broadcasting error that I don't understand.

darsnack / SpikingNN.jl

running example population-test with GPU fails. #26