Open Olli1080 opened 1 month ago
After some testing i've got the same results for 12.5 Patch 1 and 12.3 Patch 2 (didn't check 12.4). The same behaviour could also be triggered without -rdc=true but instead -ewp.
The issue does not appear if i set SM>=60, compute>=60, which if left empty defaults to SM_52, compute_52. The issue is also present for SM_53, compute_53. I've tested all configurations of 52<=SM<=89, 52<=compute<=89
Is this a duplicate?
Type of Bug
Silent Failure
Component
Thrust
Describe the bug
thrust::inclusive_scan does not work properly on certain vector sizes and seems to stop midway e.g. [1,2,3,x,x,...,x] (x being the value the result vector was initialized with) After some experiments it seems that -rdc=true causes this behaviour. (changing from debug to release results in the exact same results)
Visual Studio Community 2022 -Version 17.11.4
CUDA 12.6 Update 1
How to Reproduce
Log generated on my system:
Expected behavior
inclusive_scan should work no matter the status of GenerateRelocatableDeviceCode
Reproduction link
https://github.com/Olli1080/Cuda-Bug.git
Operating System
Windows 11 Pro 10.0.22631
nvidia-smi output
+-----------------------------------------------------------------------------------------+ | NVIDIA-SMI 560.94 Driver Version: 560.94 CUDA Version: 12.6 | |-----------------------------------------+------------------------+----------------------+ | GPU Name Driver-Model | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+========================+======================| | 0 NVIDIA GeForce RTX 4090 WDDM | 00000000:01:00.0 On | Off | | 0% 38C P8 22W / 450W | 1460MiB / 24564MiB | 4% Default | | | | N/A | +-----------------------------------------+------------------------+----------------------+
NVCC version
nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2024 NVIDIA Corporation Built on Wed_Aug_14_10:26:51_Pacific_Daylight_Time_2024 Cuda compilation tools, release 12.6, V12.6.68 Build cuda_12.6.r12.6/compiler.34714021_0