JuliaGPU / CUDA.jl

CUDA programming in Julia.
https://juliagpu.org/cuda/
Other
1.21k stars 222 forks source link

Inconsistency between CUDA.jl and Base for sum! #2506

Open AaronGhost opened 1 month ago

AaronGhost commented 1 month ago

Hi, thanks again for putting CUDA.jl together!

I found that the return type of the sum! function can be different between Array and CuArray: the Array return type is the same as the left argument while the CuArray return type retains the singleton dimension.

using CUDA
X = rand(Float32, (50, 50, 5));
Y = similar(X, (50,50));
res = sum!(Y, X);
size(res) # (50,50)

X_d = CuArray(X);
Y_d = CuArray(Y);
res_d = sum!(Y_d, X_d);
size(res_d) # (50,50,1)

I would expect to get the same type as Y_d in this case.

Version info

Julia Version 1.10.5
Commit 6f3fdf7b36 (2024-08-27 14:19 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
  CPU: 16 × 11th Gen Intel(R) Core(TM) i9-11900K @ 3.50GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, rocketlake)
Threads: 1 default, 0 interactive, 1 GC (on 16 virtual cores)

CUDA runtime 12.6, artifact installation
CUDA driver 12.4
NVIDIA driver 551.61.0

CUDA libraries:
- CUBLAS: 12.6.1
- CURAND: 10.3.7
- CUFFT: 11.2.6
- CUSOLVER: 11.6.4
- CUSPARSE: 12.5.3
- CUPTI: 2024.3.1 (API 24.0.0)
- NVML: 12.0.0+551.61

Julia packages:
- CUDA: 5.5.1
- CUDA_Driver_jll: 0.10.2+0
- CUDA_Runtime_jll: 0.15.2+0

Toolchain:
- Julia: 1.10.5
- LLVM: 15.0.7

1 device:
  0: NVIDIA RTX A5000 (sm_86, 21.757 GiB / 23.988 GiB available)
maleadt commented 1 month ago

Good catch! IIRC we introduce these for simplicity of the kernel implementation: https://github.com/JuliaGPU/CUDA.jl/blob/a0aa8b8c142f5eab9db0889802fba9636bdb454b/src/mapreduce.jl#L183-L187

Probably need to reshape them out, or keep the original input around for returning.