Dynamic parallelism - Githubissues

I am trying to set up a dynamic kernel wherein a KA kernel launches a CUDA kernel. The final objective would be to have dynamic parallelism using only kernel abstractions. This is a MWE showing the comparison between launching the parent kernel with CUDA or with KA

the child kernel

function child!(a)
    i = threadIdx().x
    @inbounds a[i] = i
    return nothing
end

CUDA implementation (runs)

function parent!(a)
    @cuda dynamic=true threads=10 blocks=1 child!(a)
    return nothing
end

a = CuArray(zeros(10))

kernel! = @cuda launch=false maxthreads=10 always_inline=true parent!(a)

kernel!(a; threads=1, blocks=1)

KA implementation

@kernel function parent!(a)
    @cuda dynamic=true threads=10 blocks=1 children!(a)
end

a = CuArray(zeros(10))

kernel! = parent!(CUDA.CUDABackend(), 1, 1)

kernel!(a)

returns

JIT session error: Symbols not found: [ cudaGetErrorString ]
JIT session error: Failed to materialize symbols: { (JuliaOJIT, { julia_throw_device_cuerror_3299 }) }
JIT session error: Failed to materialize symbols: { (JuliaOJIT, { julia_#_#14_3295 }) }
JIT session error: Symbols not found: [ cudaGetErrorString ]
JIT session error: Failed to materialize symbols: { (JuliaOJIT, { julia_throw_device_cuerror_3306 }) }
ERROR: a CUDA error was thrown during kernel execution: invalid configuration argument (code 9, cudaErrorInvalidConfiguration)
ERROR: a exception was thrown during kernel execution.
Stacktrace:
 [1] throw_device_cuerror at /home/ssilvest/.julia/packages/CUDA/35NC6/src/device/intrinsics/dynamic_parallelism.jl:20
 [2] #launch#950 at /home/ssilvest/.julia/packages/CUDA/35NC6/src/device/intrinsics/dynamic_parallelism.jl:27
 [3] launch at /home/ssilvest/.julia/packages/CUDA/35NC6/src/device/intrinsics/dynamic_parallelism.jl:65
 [4] #868 at /home/ssilvest/.julia/packages/CUDA/35NC6/lib/cudadrv/execution.jl:136
 [5] macro expansion at /home/ssilvest/.julia/packages/CUDA/35NC6/lib/cudadrv/execution.jl:95
 [6] macro expansion at ./none:0
 [7] convert_arguments at ./none:0
 [8] #cudacall#867 at /home/ssilvest/.julia/packages/CUDA/35NC6/lib/cudadrv/execution.jl:135
 [9] cudacall at /home/ssilvest/.julia/packages/CUDA/35NC6/lib/cudadrv/execution.jl:134
 [10] macro expansion at /home/ssilvest/.julia/packages/CUDA/35NC6/src/compiler/execution.jl:219
 [11] macro expansion at ./none:0
 [12] #call#1045 at ./none:0
 [13] call at ./none:0
 [14] #_#1061 at /home/ssilvest/.julia/packages/CUDA/35NC6/src/compiler/execution.jl:371
 [15] DeviceKernel at /home/ssilvest/.julia/packages/CUDA/35NC6/src/compiler/execution.jl:371
 [16] macro expansion at /home/ssilvest/.julia/packages/CUDA/35NC6/src/compiler/execution.jl:88
 [17] macro expansion at /home/ssilvest/test.jl:46
 [18] gpu_parent! at /home/ssilvest/.julia/packages/KernelAbstractions/WoCk1/src/macros.jl:90
 [19] gpu_parent! at ./none:0

Is this expected? I guess it might be a problem of KA setting up maxthreads=1 in the kernel call

JuliaGPU / KernelAbstractions.jl

Dynamic parallelism #442