JuliaGPU / CUDA.jl

CUDA programming in Julia.
https://juliagpu.org/cuda/
Other
1.16k stars 206 forks source link

Non-contiguous inputs to GEMM result in wrong results #2408

Open maleadt opened 3 weeks ago

maleadt commented 3 weeks ago

As reported on Slack; haven't looked into this closely:

julia> using CUDA

julia> W = rand(Float32, 3, 3, 2);

julia> x = rand(Float32, 2, 10);

julia> w1 = view(W, 1, :, :);

julia> h = w1 * x
3×10 Matrix{Float32}:
 0.606262  0.42194   0.62708   0.522596  0.326441  0.310775  0.350313  0.583037  0.579725  0.236126
 0.828729  0.762325  0.987844  0.819135  0.432217  0.52377   0.526026  0.852888  0.890183  0.471741
 0.469378  0.852466  0.855728  0.701493  0.213036  0.521012  0.404868  0.609816  0.72576   0.604935

julia> Wgpu = cu(W);

julia> xgpu = cu(x);

julia> w1gpu = view(Wgpu, 1, :, :);

julia> hgpu = w1gpu * xgpu
3×10 CuArray{Float32, 2, CUDA.DeviceMemory}:
 0.606262  0.42194   0.62708   0.522596  0.326441  0.310775  0.350313  0.583037  0.579725  0.236126
 0.163821  0.194273  0.225959  0.186531  0.082149  0.126778  0.11506   0.181726  0.198921  0.128239
 0.407537  0.551057  0.609835  0.502297  0.199246  0.351524  0.30346   0.472498  0.530546  0.373423