JuliaGPU / CUDA.jl

CUDA programming in Julia.
https://juliagpu.org/cuda/
Other
1.16k stars 206 forks source link

Constant-folding of reinterpret #2424

Open maleadt opened 1 week ago

maleadt commented 1 week ago

MWE:

function vecdot_q4_kernel_MVP(scales)
  kmask1, kmask2, kmask3 = 0x3f3f3f3f, 0x0f0f0f0f, 0x03030303
  scales_uint32 = reinterpret(NTuple{3, UInt32}, scales)
  utmp0, utmp1, utmp2 = scales[1], scales[2], scales[3]

  return
end

function main()
    scales = UInt8.((1,2,3,4, 2,3,4,5, 3,4,5,6)) # NTuple{12, UInt8}
    @cuda threads=1 blocks=1 vecdot_q4_kernel_MVP(scales)
end
Reason: unsupported dynamic function invocation (call to -)
Stacktrace:
 [1] packedsize
   @ ./reinterpretarray.jl:763
 [2] _reinterpret
   @ ./reinterpretarray.jl:805
 [3] reinterpret
   @ ./essentials.jl:584
 [4] vecdot_q4_kernel_MVP
   @ ~/Julia/pkg/CUDA/wip2.jl:9
Reason: unsupported dynamic function invocation (call to padding(T::DataType, baseoffset::Int64) @ Base reinterpretarray.jl:701)
Stacktrace:
 [1] padding
   @ ./reinterpretarray.jl:702
 [2] packedsize
   @ ./reinterpretarray.jl:762
 [3] _reinterpret
   @ ./reinterpretarray.jl:804
 [4] reinterpret
   @ ./essentials.jl:584
 [5] vecdot_q4_kernel_MVP
   @ ~/Julia/pkg/CUDA/wip2.jl:9
Reason: unsupported dynamic function invocation (call to -)
Stacktrace:
 [1] packedsize
   @ ./reinterpretarray.jl:763
 [2] _reinterpret
   @ ./reinterpretarray.jl:804
 [3] reinterpret
   @ ./essentials.jl:584
 [4] vecdot_q4_kernel_MVP
   @ ~/Julia/pkg/CUDA/wip2.jl:9

It's apparently expected that packedsize and padding are constant-folded.