Scalar Indexing is not allowed on GPU array (KH instability)

sangeethasankar01 commented 7 months ago

@glwagner I am attempting to run the Kelvin-Helmholtz instability example on a GPU, but the model fails and throws errors. Can someone help me to sort out this error? Please find the attached error below:

using Random, Statistics

mean_perturbation_kinetic_energy = Field(Average(1/2 * (u^2 + w^2)))
noise(x, z) = randn()
set!(model, u=noise, w=noise, b=noise)
rescale!(simulation.model, mean_perturbation_kinetic_energy, target_kinetic_energy=1e-6)
growth_rates, power_method_data = estimate_growth_rate(simulation, mean_perturbation_kinetic_energy, perturbation_vorticity, b)

@info "Power iterations converged! Estimated growth rate: $(growth_rates[end])"

Error: Scalar indexing is disallowed.
Invocation of getindex resulted in scalar indexing of a GPU array.
This is typically caused by calling an iterating implementation of a method.
Such implementations *do not* execute on the GPU, but very slowly on the CPU,
and therefore should be avoided.

If you want to allow scalar iteration, use `allowscalar` or `@allowscalar`
to enable scalar iteration globally or for the operations in question.

glwagner commented 7 months ago

What line triggers the error?

navidcy commented 7 months ago

@sangeethasankar01 if you post the whole error message that you get with the stack trace we will understand more which line triggered this error.

sangeethasankar01 commented 7 months ago

Scalar indexing is disallowed.
Invocation of getindex resulted in scalar indexing of a GPU array.
This is typically caused by calling an iterating implementation of a method.
Such implementations *do not* execute on the GPU, but very slowly on the CPU,
and therefore should be avoided.

If you want to allow scalar iteration, use `allowscalar` or `@allowscalar`
to enable scalar iteration globally or for the operations in question.

Stacktrace:
  [1] error(s::String)
    @ Base .\error.jl:35
  [2] errorscalar(op::String)
    @ GPUArraysCore C:\Users\ADMIN\.julia\packages\GPUArraysCore\GMsgk\src\GPUArraysCore.jl:155
  [3] _assertscalar(op::String, behavior::GPUArraysCore.ScalarIndexing)
    @ GPUArraysCore C:\Users\ADMIN\.julia\packages\GPUArraysCore\GMsgk\src\GPUArraysCore.jl:128
  [4] assertscalar(op::String)
    @ GPUArraysCore C:\Users\ADMIN\.julia\packages\GPUArraysCore\GMsgk\src\GPUArraysCore.jl:116
  [5] getindex(A::CuArray{Float64, 3, CUDA.Mem.DeviceBuffer}, I::Int64)
    @ GPUArrays C:\Users\ADMIN\.julia\packages\GPUArrays\Hd5Sk\src\host\indexing.jl:48
  [6] scalar_getindex
    @ C:\Users\ADMIN\.julia\packages\GPUArrays\Hd5Sk\src\host\indexing.jl:34 [inlined]
  [7] _getindex
    @ C:\Users\ADMIN\.julia\packages\GPUArrays\Hd5Sk\src\host\indexing.jl:17 [inlined]
  [8] getindex
    @ C:\Users\ADMIN\.julia\packages\GPUArrays\Hd5Sk\src\host\indexing.jl:15 [inlined]
  [9] getindex
    @ C:\Users\ADMIN\.julia\packages\OffsetArrays\rMTtC\src\OffsetArrays.jl:422 [inlined]
 [10] getindex
    @ C:\Users\ADMIN\.julia\packages\Oceananigans\E4XVr\src\Fields\field.jl:540 [inlined]
 [11] rescale!(model::NonhydrostaticModel{Oceananigans.TimeSteppers.RungeKutta3TimeStepper{Float64, @NamedTuple{u::Field{Face, Center, Center, Nothing, RectilinearGrid{Float64, Periodic, Flat, Bounded, Float64, Float64, Float64, OffsetArrays.OffsetVector{Fl... (repeats 1 time)
 [12] top-level scope
    @ In[10]:9

This is an error message I am getting from my code. Please share your comments on how to resolve this issue.

navidcy commented 7 months ago

thanks!

ps: enclosing code/error in triple backticks

```
code
```

makes it much more readable. (You can edit the post and enclose the error msg in triple backticks)

navidcy commented 7 months ago

seems that the issue is coming from rescale!(mode) and if I had to guess from the [1, 1, 1] at

https://github.com/CliMA/Oceananigans.jl/blob/fb9455bc10721a880d9911a39735e37f1c4961e0/examples/kelvin_helmholtz_instability.jl#L220

I'll try to reproduce this.

glwagner commented 7 months ago

Try

rescale_factor = CUDA.@allowscalar √(target_kinetic_energy / energy[1, 1, 1])

navidcy commented 7 months ago

You might have to call

using CUDA

at the top of the script.

sangeethasankar01 commented 7 months ago

@glwagner As you mentioned, I attempted to follow your suggestion by specifying the CUDA.@allowscalar. However, I still encountered the same error on a different line.

Code:

function rescale!(model, energy; target_kinetic_energy = 1e-6)
    compute!(energy)
    rescale_factor = CUDA.@allowscalar √(target_kinetic_energy / energy[1, 1, 1]) 
    #rescale_factor = √(target_kinetic_energy / energy[1, 1, 1])

    for f in merge(model.velocities, model.tracers)
        f .*= rescale_factor
    end

    return nothing
end

Error:

Scalar indexing is disallowed.
Invocation of getindex resulted in scalar indexing of a GPU array.
This is typically caused by calling an iterating implementation of a method.
Such implementations *do not* execute on the GPU, but very slowly on the CPU,
and therefore should be avoided.

If you want to allow scalar iteration, use `allowscalar` or `@allowscalar`
to enable scalar iteration globally or for the operations in question.

Stacktrace:
  [1] error(s::String)
    @ Base .\error.jl:35
  [2] errorscalar(op::String)
    @ GPUArraysCore C:\Users\ADMIN\.julia\packages\GPUArraysCore\GMsgk\src\GPUArraysCore.jl:155
  [3] _assertscalar(op::String, behavior::GPUArraysCore.ScalarIndexing)
    @ GPUArraysCore C:\Users\ADMIN\.julia\packages\GPUArraysCore\GMsgk\src\GPUArraysCore.jl:128
  [4] assertscalar(op::String)
    @ GPUArraysCore C:\Users\ADMIN\.julia\packages\GPUArraysCore\GMsgk\src\GPUArraysCore.jl:116
  [5] getindex(A::CuArray{Float64, 3, CUDA.Mem.DeviceBuffer}, I::Int64)
    @ GPUArrays C:\Users\ADMIN\.julia\packages\GPUArrays\Hd5Sk\src\host\indexing.jl:48
  [6] scalar_getindex
    @ C:\Users\ADMIN\.julia\packages\GPUArrays\Hd5Sk\src\host\indexing.jl:34 [inlined]
  [7] _getindex
    @ C:\Users\ADMIN\.julia\packages\GPUArrays\Hd5Sk\src\host\indexing.jl:17 [inlined]
  [8] getindex
    @ C:\Users\ADMIN\.julia\packages\GPUArrays\Hd5Sk\src\host\indexing.jl:15 [inlined]
  [9] getindex
    @ .\subarray.jl:290 [inlined]
 [10] macro expansion
    @ .\multidimensional.jl:917 [inlined]
 [11] macro expansion
    @ .\cartesian.jl:64 [inlined]
 [12] macro expansion
    @ .\multidimensional.jl:912 [inlined]
 [13] _unsafe_getindex!
    @ .\multidimensional.jl:925 [inlined]
 [14] _unsafe_getindex(::IndexCartesian, ::SubArray{Float64, 3, CuArray{Float64, 3, CUDA.Mem.DeviceBuffer}, Tuple{UnitRange{Int64}, UnitRange{Int64}, UnitRange{Int64}}, false}, ::Base.Slice{Base.OneTo{Int64}}, ::Int64, ::Base.Slice{Base.OneTo{Int64}})
    @ Base .\multidimensional.jl:903
 [15] _getindex
    @ .\multidimensional.jl:889 [inlined]
 [16] getindex(::SubArray{Float64, 3, CuArray{Float64, 3, CUDA.Mem.DeviceBuffer}, Tuple{UnitRange{Int64}, UnitRange{Int64}, UnitRange{Int64}}, false}, ::Function, ::Int64, ::Function)
    @ Base .\abstractarray.jl:1291
 [17] estimate_growth_rate(simulation::Simulation{NonhydrostaticModel{Oceananigans.TimeSteppers.RungeKutta3TimeStepper{Float64, @NamedTuple{u::Field{Face, Center, Center, Nothing....

To solve this error, I tried to add the same comment CUDA.@allowscalar to the last line of the code below. Now, the code is running without any errors.

using Random, Statistics

mean_perturbation_kinetic_energy = Field(Average(1/2 * (u^2 + w^2)))
noise(x, z) = randn()
#CUDA.allowscalar()
set!(model, u=noise, w=noise, b=noise)
rescale!(simulation.model, mean_perturbation_kinetic_energy, target_kinetic_energy=1e-6)
growth_rates, power_method_data = CUDA.@allowscalar estimate_growth_rate(simulation, mean_perturbation_kinetic_energy, perturbation_vorticity, b)

@info "Power iterations converged! Estimated growth rate: $(growth_rates[end])"

Thank you. I genuinely appreciate your support. Please let me know about your suggestion about the added comment line (whether it is correct or not).

sangeethasankar01 commented 7 months ago

You might have to call
using CUDA
at the top of the script.

@navidcy I already called this comment at the top of the script. However, I faced the same issue that I mentioned.

navidcy commented 7 months ago

Let me try to run the script on GPU. Out of curiosity, which version of Oceananigans and CUDA you use? Could you print out the output of using Pkg; Pkg.status()?

sangeethasankar01 commented 7 months ago

Let me try to run the script on GPU. Out of curiosity, which version of Oceananigans and CUDA you use? Could you print out the output of using Pkg; Pkg.status()?

Please find the details below.


julia> Pkg.status()
Status `C:\Users\ADMIN\.julia\environments\v1.10\Project.toml`
  [052768ef] CUDA v5.2.0
  [13f3f980] CairoMakie v0.11.9
  [7073ff75] IJulia v1.24.2
  [033835bb] JLD2 v0.4.46
  [bdcacae8] LoopVectorization v0.12.166
  [510215fc] Observables v0.5.5
  [9e8cae18] Oceananigans v0.90.11
  [91a5bcdd] Plots v1.40.2
  [438e738f] PyCall v1.96.4

glwagner commented 7 months ago

Can you show the estimate_growth_rate function that you are using?

glwagner commented 7 months ago

Writing @allowscalar will of course always fix a scalar indexing error. But you may end up with slow code, which defeats the purpose of using the GPU.

Taking a look at the script: https://github.com/CliMA/Oceananigans.jl/blob/main/examples/kelvin_helmholtz_instability.jl

we see that there are calls to getindex in a few places:

https://github.com/CliMA/Oceananigans.jl/blob/fb9455bc10721a880d9911a39735e37f1c4961e0/examples/kelvin_helmholtz_instability.jl#L190

https://github.com/CliMA/Oceananigans.jl/blob/fb9455bc10721a880d9911a39735e37f1c4961e0/examples/kelvin_helmholtz_instability.jl#L197

https://github.com/CliMA/Oceananigans.jl/blob/fb9455bc10721a880d9911a39735e37f1c4961e0/examples/kelvin_helmholtz_instability.jl#L256C1-L257C1

https://github.com/CliMA/Oceananigans.jl/blob/fb9455bc10721a880d9911a39735e37f1c4961e0/examples/kelvin_helmholtz_instability.jl#L266

https://github.com/CliMA/Oceananigans.jl/blob/fb9455bc10721a880d9911a39735e37f1c4961e0/examples/kelvin_helmholtz_instability.jl#L270

Wherever we write energy[1, 1, 1], it is safe to use @allowscalar. This is just a single indexing operation and will be fast.

However, where we write collect, its probably better to rewrite this in a different way.

But note that collect is only used for plotting. So what we do here depends on the application.

@sangeethasankar01 are you trying to get this to run on the GPU for some specific purpose? Or is this merely an educational exercise?

I also think we should convert this issue to a discussion. I don't think we want to make these changes to the example in the source code.

sangeethasankar01 commented 6 months ago

@sangeethasankar01 are you trying to get this to run on the GPU for some specific purpose? Or is this merely an educational exercise?

I am a research scholar at IIT Madras, working under the supervision of Dr. Arjun Jagannathan. Currently, I'm exploring various problems related to flow instabilities. However, my current focus on this Kelvin-Helmholtz instability problem is purely for educational practice. I aim to understand the architecture of the code and optimize it for GPU execution.

sangeethasankar01 commented 6 months ago

Can you show the estimate_growth_rate function that you are using?

I made the below changes only to enable this code to run on the GPU.

Change:1

function rescale!(model, energy; target_kinetic_energy = 1e-6)
    compute!(energy)
    rescale_factor = CUDA.@allowscalar √(target_kinetic_energy / energy[1, 1, 1]) 
    #rescale_factor = √(target_kinetic_energy / energy[1, 1, 1])

    for f in merge(model.velocities, model.tracers)
        f .*= rescale_factor
    end

    return nothing
end

using Printf

## b) Check for convergence
convergence(σ) = length(σ) > 1 ? abs((σ[end] - σ[end-1]) / σ[end]) : 9.1e18  # pretty big (not Inf tho)

Change:2

using Random, Statistics

mean_perturbation_kinetic_energy = Field(Average(1/2 * (u^2 + w^2)))
noise(x, z) = randn()
#CUDA.allowscalar()
set!(model, u=noise, w=noise, b=noise)
rescale!(simulation.model, mean_perturbation_kinetic_energy, target_kinetic_energy=1e-6)
growth_rates, power_method_data = CUDA.@allowscalar estimate_growth_rate(simulation, mean_perturbation_kinetic_energy, perturbation_vorticity, b)

@info "Power iterations converged! Estimated growth rate: $(growth_rates[end])"

Could you please review this and provide your suggestions?

navidcy commented 6 months ago

Ideally you don't want CUDA.@allowscalar in front of estimate_growth_rate but rather you wanna go and modify estimate_growth_rate to ensure that you only put CUDA.@allowscalar in places that does not affect performance. Scalar operations on the GPU could diminish all the speedup you get from the GPU, so CUDA.@allowscalar must be used with great caution!! There is a relevant discussion in the docs.

(I'll convert this Issue into a Discussion btw.)

CliMA / Oceananigans.jl

Scalar Indexing is not allowed on GPU array (KH instability) #3522