Closed gzhang closed 4 years ago
julia> using CUDA julia> CUDA.versioninfo() CUDA toolkit 11.0.3, artifact installation CUDA driver 11.1.0 NVIDIA driver 456.43.0 Libraries: - CUBLAS: 11.2.0 - CURAND: 10.2.1 - CUFFT: 10.2.1 - CUSOLVER: 10.6.0 - CUSPARSE: 11.1.1 - CUPTI: 13.0.0 - NVML: 11.0.0+456.43 - CUDNN: 8.0.2 (for CUDA 11.0.0) - CUTENSOR: 1.2.0 (for CUDA 11.0.0) Toolchain: - Julia: 1.5.2 - LLVM: 9.0.1 - PTX ISA support: 3.2, 4.0, 4.1, 4.2, 4.3, 5.0, 6.0, 6.1, 6.3, 6.4 - Device support: sm_35, sm_37, sm_50, sm_52, sm_53, sm_60, sm_61, sm_62, sm_70, sm_72, sm_75 1 device: 0: Quadro P2000 (sm_61, 3.708 GiB / 4.000 GiB available) (@v1.5) pkg> test CUDA Testing CUDA Status `C:\Users\gzhang\AppData\Local\Temp\jl_E27BAs\Project.toml` [621f4979] AbstractFFTs v0.5.0 [79e6a3ab] Adapt v2.3.0 [b99e7846] BinaryProvider v0.5.10 [fa961155] CEnum v0.4.1 [052768ef] CUDA v1.3.3 [864edb3b] DataStructures v0.17.20 [e2ba6199] ExprTools v0.1.3 [7a1cc6ca] FFTW v1.2.4 [1a297f60] FillArrays v0.8.14 [f6369f11] ForwardDiff v0.10.12 [0c68f7d7] GPUArrays v5.2.1 [61eb1bfa] GPUCompiler v0.6.1 [929cbde3] LLVM v2.0.0 [1914dd2f] MacroTools v0.5.5 [872c559c] NNlib v0.7.5 [189a3867] Reexport v0.2.0 [ae029012] Requires v1.1.0 [a759f4b9] TimerOutputs v0.5.6 [ade2ca70] Dates [8ba89e20] Distributed [8f399da3] Libdl [37e2e46d] LinearAlgebra [56ddb016] Logging [44cfe95a] Pkg [de0858da] Printf [3fa0cd96] REPL [9a3f8284] Random [2f01184e] SparseArrays [10745b16] Statistics [8dfed614] Test Status `C:\Users\gzhang\AppData\Local\Temp\jl_E27BAs\Manifest.toml` [621f4979] AbstractFFTs v0.5.0 [79e6a3ab] Adapt v2.3.0 [b99e7846] BinaryProvider v0.5.10 [fa961155] CEnum v0.4.1 [052768ef] CUDA v1.3.3 [bbf7d656] CommonSubexpressions v0.3.0 [e66e0078] CompilerSupportLibraries_jll v0.3.3+0 [864edb3b] DataStructures v0.17.20 [163ba53b] DiffResults v1.0.2 [b552c78f] DiffRules v1.0.1 [e2ba6199] ExprTools v0.1.3 [7a1cc6ca] FFTW v1.2.4 [f5851436] FFTW_jll v3.3.9+5 [1a297f60] FillArrays v0.8.14 [f6369f11] ForwardDiff v0.10.12 [0c68f7d7] GPUArrays v5.2.1 [61eb1bfa] GPUCompiler v0.6.1 [1d5cc7b8] IntelOpenMP_jll v2018.0.3+0 [929cbde3] LLVM v2.0.0 [856f044c] MKL_jll v2020.2.254+0 [1914dd2f] MacroTools v0.5.5 [872c559c] NNlib v0.7.5 [77ba4419] NaNMath v0.3.4 [efe28fd5] OpenSpecFun_jll v0.5.3+3 [bac558e1] OrderedCollections v1.3.1 [189a3867] Reexport v0.2.0 [ae029012] Requires v1.1.0 [276daf66] SpecialFunctions v0.10.3 [90137ffa] StaticArrays v0.12.4 [a759f4b9] TimerOutputs v0.5.6 [2a0f44e3] Base64 [ade2ca70] Dates [8ba89e20] Distributed [b77e0a4c] InteractiveUtils [76f85450] LibGit2 [8f399da3] Libdl [37e2e46d] LinearAlgebra [56ddb016] Logging [d6f4376e] Markdown [44cfe95a] Pkg [de0858da] Printf [3fa0cd96] REPL [9a3f8284] Random [ea8e919c] SHA [9e88b42a] Serialization [6462fe0b] Sockets [2f01184e] SparseArrays [10745b16] Statistics [8dfed614] Test [cf7118a7] UUIDs [4ec0a83e] Unicode ┌ Info: System information: │ CUDA toolkit 11.0.3, artifact installation │ CUDA driver 11.1.0 │ NVIDIA driver 456.43.0 │ │ Libraries: │ - CUBLAS: 11.2.0 │ - CURAND: 10.2.1 │ - CUFFT: 10.2.1 │ - CUSOLVER: 10.6.0 │ - CUSPARSE: 11.1.1 │ - CUPTI: 13.0.0 │ - NVML: 11.0.0+456.43 │ - CUDNN: 8.0.2 (for CUDA 11.0.0) │ - CUTENSOR: 1.2.0 (for CUDA 11.0.0) │ │ Toolchain: │ - Julia: 1.5.2 │ - LLVM: 9.0.1 │ - PTX ISA support: 3.2, 4.0, 4.1, 4.2, 4.3, 5.0, 6.0, 6.1, 6.3, 6.4 │ - Device support: sm_35, sm_37, sm_50, sm_52, sm_53, sm_60, sm_61, sm_62, sm_70, sm_72, sm_75 │ │ 1 device: └ 0: Quadro P2000 (sm_61, 3.719 GiB / 4.000 GiB available) [ Info: Testing using 1 device(s): 1. Quadro P2000 (UUID 9b0b39dd-2ad4-66d0-d456-01bb0741d565) [ Info: Skipping the following tests: cutensor, device\wmma | | ---------------- GPU ---------------- | ---------------- CPU ---------------- | Test (Worker) | Time (s) | GC (s) | GC % | Alloc (MB) | RSS (MB) | GC (s) | GC % | Alloc (MB) | RSS (MB) | initialization (2) | 15.22 | 0.00 | 0.0 | 0.00 | N/A | 0.15 | 1.0 | 222.34 | 542.11 | apiutils (2) | 1.09 | 0.00 | 0.0 | 0.00 | N/A | 0.00 | 0.0 | 5.37 | 542.11 | array (2) | 250.97 | 0.60 | 0.2 | 5.20 | N/A | 6.77 | 2.7 | 6322.69 | 769.18 | broadcast (2) | 96.12 | 0.00 | 0.0 | 0.00 | N/A | 1.43 | 1.5 | 1495.83 | 811.03 | codegen (2) | 18.79 | 0.00 | 0.0 | 0.00 | N/A | 0.28 | 1.5 | 287.59 | 811.03 | cublas (2) | 291.38 | 0.10 | 0.0 | 11.12 | N/A | 6.73 | 2.3 | 6636.25 | 1231.46 | cudnn (2) | 241.04 | 0.01 | 0.0 | 0.60 | N/A | 4.94 | 2.1 | 5065.72 | 1597.04 | cufft (2) | 81.18 | 0.07 | 0.1 | 133.23 | N/A | 1.79 | 2.2 | 1934.60 | 1597.04 | curand (2) | 0.41 | 0.00 | 0.0 | 0.00 | N/A | 0.00 | 0.0 | 5.49 | 1597.04 | cusolver (2) | 190.30 | 0.23 | 0.1 | 1229.85 | N/A | 3.90 | 2.0 | 4426.17 | 1950.82 | cusparse (2) | 114.74 | 0.03 | 0.0 | 4.46 | N/A | 1.74 | 1.5 | 2043.30 | 1950.82 | examples (2) | 501.67 | 0.00 | 0.0 | 0.00 | N/A | 0.08 | 0.0 | 23.86 | 1950.82 | exceptions (2) | 375.83 | 0.00 | 0.0 | 0.00 | N/A | 0.00 | 0.0 | 23.78 | 1950.82 | execution (2) | failed at 2020-10-16T15:25:31.225 forwarddiff (3) | 311.40 | 0.23 | 0.1 | 0.00 | N/A | 4.52 | 1.5 | 3907.68 | 657.59 | iterator (3) | 9.98 | 0.00 | 0.0 | 1.07 | N/A | 0.29 | 2.9 | 257.28 | 657.59 | nnlib (3) | 11.04 | 0.00 | 0.0 | 0.00 | N/A | 0.20 | 1.8 | 199.02 | 878.17 | nvml (3) | 2.45 | 0.00 | 0.0 | 0.00 | N/A | 0.06 | 2.5 | 44.01 | 878.17 | nvtx (3) | 4.71 | 0.00 | 0.0 | 0.00 | N/A | 0.07 | 1.4 | 72.40 | 878.17 | pointer (3) | 0.63 | 0.00 | 0.0 | 0.00 | N/A | 0.00 | 0.0 | 7.71 | 878.17 | pool (3) | 9.16 | 0.00 | 0.0 | 0.00 | N/A | 0.89 | 9.7 | 157.98 | 878.17 | random (3) | 35.16 | 0.01 | 0.0 | 0.02 | N/A | 0.58 | 1.7 | 803.79 | 878.17 | statistics (3) | 54.18 | 0.00 | 0.0 | 0.00 | N/A | 1.18 | 2.2 | 1324.56 | 878.17 | texture (3) | 86.61 | 0.00 | 0.0 | 0.08 | N/A | 2.41 | 2.8 | 2392.62 | 878.17 | threading (3) | 17.60 | 0.01 | 0.1 | 10.94 | N/A | 0.44 | 2.5 | 366.53 | 979.98 | utils (3) | 4.11 | 0.00 | 0.0 | 0.00 | N/A | 0.11 | 2.7 | 117.91 | 979.98 | cudadrv\context (3) | 2.37 | 0.00 | 0.0 | 0.00 | N/A | 0.05 | 2.2 | 62.36 | 979.98 | cudadrv\devices (3) | 1.15 | 0.00 | 0.0 | 0.00 | N/A | 0.00 | 0.0 | 29.99 | 979.98 | cudadrv\errors (3) | 0.74 | 0.00 | 0.0 | 0.00 | N/A | 0.05 | 7.1 | 28.14 | 979.98 | cudadrv\events (3) | 0.75 | 0.00 | 0.0 | 0.00 | N/A | 0.00 | 0.0 | 31.26 | 979.98 | cudadrv\execution (3) | 3.88 | 0.00 | 0.0 | 0.00 | N/A | 0.10 | 2.7 | 102.47 | 979.98 | cudadrv\memory (3) | 9.49 | 0.00 | 0.0 | 0.00 | N/A | 0.21 | 2.2 | 234.28 | 979.98 | cudadrv\module (3) | 1.77 | 0.00 | 0.0 | 0.00 | N/A | 0.00 | 0.0 | 29.90 | 979.98 | cudadrv\occupancy (3) | 0.51 | 0.00 | 0.0 | 0.00 | N/A | 0.00 | 0.0 | 12.57 | 979.98 | cudadrv\profile (3) | 1.48 | 0.00 | 0.0 | 0.00 | N/A | 0.10 | 7.1 | 60.59 | 979.98 | cudadrv\stream (3) | 0.93 | 0.00 | 0.0 | 0.00 | N/A | 0.00 | 0.0 | 41.43 | 979.98 | cudadrv\version (3) | 0.04 | 0.00 | 0.0 | 0.00 | N/A | 0.00 | 0.0 | 0.07 | 979.98 | cusolver\cusparse (3) | 73.94 | 0.01 | 0.0 | 0.19 | N/A | 2.58 | 3.5 | 1705.97 | 1297.63 | device\array (3) | 11.65 | 0.00 | 0.0 | 0.00 | N/A | 0.19 | 1.6 | 206.23 | 1297.63 | device\intrinsics (3) | 365.43 | 0.01 | 0.0 | 0.01 | N/A | 3.85 | 1.1 | 4281.75 | 1297.63 | device\pointer (3) | 24.92 | 0.00 | 0.0 | 0.00 | N/A | 0.37 | 1.5 | 471.82 | 1297.63 | gpuarrays/math (3) | 9.07 | 0.00 | 0.0 | 0.00 | N/A | 0.15 | 1.6 | 192.30 | 1297.63 | gpuarrays/indexing scalar (3) | 21.53 | 0.00 | 0.0 | 0.00 | N/A | 0.45 | 2.1 | 417.78 | 1297.63 | gpuarrays/input output (3) | 5.41 | 0.00 | 0.0 | 0.00 | N/A | 0.13 | 2.4 | 133.80 | 1297.63 | gpuarrays/value constructors (3) | 24.64 | 0.00 | 0.0 | 0.00 | N/A | 0.36 | 1.5 | 449.16 | 1297.63 | gpuarrays/indexing multidimensional (3) | 93.87 | 0.01 | 0.0 | 0.71 | N/A | 1.95 | 2.1 | 2098.38 | 1297.63 | gpuarrays/interface (3) | 10.15 | 0.00 | 0.0 | 0.00 | N/A | 0.25 | 2.4 | 195.66 | 1297.63 | gpuarrays/iterator constructors (3) | 24.23 | 0.00 | 0.0 | 0.02 | N/A | 0.36 | 1.5 | 452.58 | 1297.63 | gpuarrays/uniformscaling (3) | 28.36 | 0.01 | 0.0 | 0.01 | N/A | 0.55 | 1.9 | 498.21 | 1297.63 | gpuarrays/linear algebra (3) | 253.48 | 0.02 | 0.0 | 1.42 | N/A | 4.31 | 1.7 | 4325.65 | 1561.33 | gpuarrays/conversions (3) | 11.34 | 0.00 | 0.0 | 0.01 | N/A | 0.28 | 2.4 | 344.24 | 1561.33 | gpuarrays/fft (3) | 22.77 | 0.01 | 0.0 | 6.01 | N/A | 0.63 | 2.8 | 558.29 | 1657.55 | gpuarrays/constructors (3) | 4.81 | 0.01 | 0.2 | 0.03 | N/A | 0.07 | 1.5 | 75.54 | 1657.55 | gpuarrays/random (3) | 94.39 | 0.01 | 0.0 | 0.03 | N/A | 1.39 | 1.5 | 1610.47 | 1657.55 | gpuarrays/base (3) | 56.48 | 0.01 | 0.0 | 17.44 | N/A | 1.46 | 2.6 | 1509.64 | 1657.55 | gpuarrays/mapreduce essentials (3) | 487.60 | 0.02 | 0.0 | 3.19 | N/A | 10.14 | 2.1 | 10843.78 | 1793.80 | gpuarrays/broadcasting (3) | 255.73 | 0.01 | 0.0 | 1.19 | N/A | 6.67 | 2.6 | 5621.06 | 1793.80 | gpuarrays/mapreduce derivatives (3) | 704.05 | 0.04 | 0.0 | 3.06 | N/A | 12.11 | 1.7 | 12897.46 | 1851.68 | Worker 2 failed running test execution: Some tests did not pass: 65 passed, 0 failed, 1 errored, 0 broken. execution: Error During Test at C:\Users\gzhang\.julia\packages\CUDA\dZvbp\test\execution.jl:981 Got exception outside of a @test CUDA error: too many blocks in cooperative launch (code 720, ERROR_COOPERATIVE_LAUNCH_TOO_LARGE) Stacktrace: [1] throw_api_error(::CUDA.cudaError_enum) at C:\Users\gzhang\.julia\packages\CUDA\dZvbp\lib\cudadrv\error.jl:103 [2] macro expansion at C:\Users\gzhang\.julia\packages\CUDA\dZvbp\lib\cudadrv\error.jl:110 [inlined] [3] cuLaunchCooperativeKernel(::CuFunction, ::UInt32, ::UInt32, ::UInt32, ::UInt32, ::UInt32, ::UInt32, ::Int64, ::CuStream, ::Array{Ptr{Nothing},1}) at C:\Users\gzhang\.julia\packages\CUDA\dZvbp\lib\utils\call.jl:93 [4] #594 at C:\Users\gzhang\.julia\packages\CUDA\dZvbp\lib\cudadrv\execution.jl:64 [inlined] [5] macro expansion at C:\Users\gzhang\.julia\packages\CUDA\dZvbp\lib\cudadrv\execution.jl:33 [inlined] [6] pack_arguments at C:\Users\gzhang\.julia\packages\CUDA\dZvbp\lib\cudadrv\execution.jl:9 [inlined] [7] launch(::CuFunction, ::CuDeviceArray{Float32,2,CUDA.AS.Global}, ::CuDeviceArray{Float32,2,CUDA.AS.Global}, ::CuDeviceArray{Float32,2,CUDA.AS.Global}; blocks::Int64, threads::Int64, cooperative::Bool, shmem::Int64, stream::CuStream) at C:\Users\gzhang\.julia\packages\CUDA\dZvbp\lib\cudadrv\execution.jl:62 [8] #599 at C:\Users\gzhang\.julia\packages\CUDA\dZvbp\lib\cudadrv\execution.jl:138 [inlined] [9] macro expansion at C:\Users\gzhang\.julia\packages\CUDA\dZvbp\lib\cudadrv\execution.jl:97 [inlined] [10] convert_arguments at C:\Users\gzhang\.julia\packages\CUDA\dZvbp\lib\cudadrv\execution.jl:79 [inlined] [11] #cudacall#598 at C:\Users\gzhang\.julia\packages\CUDA\dZvbp\lib\cudadrv\execution.jl:137 [inlined] [12] #cudacall#777 at C:\Users\gzhang\.julia\packages\CUDA\dZvbp\src\compiler\execution.jl:218 [inlined] [13] macro expansion at C:\Users\gzhang\.julia\packages\CUDA\dZvbp\src\compiler\execution.jl:199 [inlined] [14] call(::CUDA.HostKernel{var"#kernel_vadd#482"(),Tuple{CuDeviceArray{Float32,2,CUDA.AS.Global},CuDeviceArray{Float32,2,CUDA.AS.Global},CuDeviceArray{Float32,2,CUDA.AS.Global}}}, ::CuDeviceArray{Float32,2,CUDA.AS.Global}, ::CuDeviceArray{Float32,2,CUDA.AS.Global}, ::CuDeviceArray{Float32,2,CUDA.AS.Global}; call_kwargs::Base.Iterators.Pairs{Symbol,Integer,Tuple{Symbol,Symbol,Symbol},NamedTuple{(:cooperative, :threads, :blocks),Tuple{Bool,Int64,Int64}}}) at C:\Users\gzhang\.julia\packages\CUDA\dZvbp\src\compiler\execution.jl:170 [15] (::CUDA.HostKernel{var"#kernel_vadd#482"(),Tuple{CuDeviceArray{Float32,2,CUDA.AS.Global},CuDeviceArray{Float32,2,CUDA.AS.Global},CuDeviceArray{Float32,2,CUDA.AS.Global}}})(::CuDeviceArray{Float32,2,CUDA.AS.Global}, ::Vararg{CuDeviceArray{Float32,2,CUDA.AS.Global},N} where N; kwargs::Base.Iterators.Pairs{Symbol,Integer,Tuple{Symbol,Symbol,Symbol},NamedTuple{(:cooperative, :threads, :blocks),Tuple{Bool,Int64,Int64}}}) at C:\Users\gzhang\.julia\packages\CUDA\dZvbp\src\compiler\execution.jl:347 [16] top-level scope at C:\Users\gzhang\.julia\packages\CUDA\dZvbp\src\compiler\execution.jl:110 [17] top-level scope at C:\Users\gzhang\.julia\packages\CUDA\dZvbp\test\execution.jl:997 [18] top-level scope at D:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.5\Test\src\Test.jl:1115 [19] top-level scope at C:\Users\gzhang\.julia\packages\CUDA\dZvbp\test\execution.jl:982 [20] include(::String) at .\client.jl:457 [21] #9 at C:\Users\gzhang\.julia\packages\CUDA\dZvbp\test\runtests.jl:79 [inlined] [22] macro expansion at C:\Users\gzhang\.julia\packages\CUDA\dZvbp\test\setup.jl:44 [inlined] [23] macro expansion at D:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.5\Test\src\Test.jl:1115 [inlined] [24] macro expansion at C:\Users\gzhang\.julia\packages\CUDA\dZvbp\test\setup.jl:44 [inlined] [25] macro expansion at C:\Users\gzhang\.julia\packages\CUDA\dZvbp\src\utilities.jl:35 [inlined] [26] macro expansion at C:\Users\gzhang\.julia\packages\CUDA\dZvbp\src\pool.jl:537 [inlined] [27] top-level scope at C:\Users\gzhang\.julia\packages\CUDA\dZvbp\test\setup.jl:43 [28] eval at .\boot.jl:331 [inlined] [29] runtests(::Function, ::String, ::Symbol, ::Nothing) at C:\Users\gzhang\.julia\packages\CUDA\dZvbp\test\setup.jl:55 [30] (::Distributed.var"#106#108"{Distributed.CallMsg{:call_fetch}})() at D:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.5\Distributed\src\process_messages.jl:294 [31] run_work_thunk(::Distributed.var"#106#108"{Distributed.CallMsg{:call_fetch}}, ::Bool) at D:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.5\Distributed\src\process_messages.jl:79 [32] macro expansion at D:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.5\Distributed\src\process_messages.jl:294 [inlined] [33] (::Distributed.var"#105#107"{Distributed.CallMsg{:call_fetch},Distributed.MsgHeader,Sockets.TCPSocket})() at .\task.jl:356 Test Summary: | Pass Error Broken Total Overall | 8147 1 2 8150 initialization | 25 25 apiutils | 15 15 array | 156 156 broadcast | 29 29 codegen | 18 18 cublas | 1881 1881 cudnn | 141 141 cufft | 151 151 curand | 1 1 cusolver | 1492 1492 cusparse | 453 453 examples | 7 7 exceptions | 17 17 execution | 65 1 66 forwarddiff | 107 107 iterator | 30 30 nnlib | 4 4 nvml | 7 7 nvtx | No tests pointer | 13 13 pool | 10 10 random | 101 101 statistics | 14 14 texture | 26 1 27 threading | No tests utils | 5 5 cudadrv\context | 12 12 cudadrv\devices | 6 6 cudadrv\errors | 6 6 cudadrv\events | 6 6 cudadrv\execution | 15 15 cudadrv\memory | 49 1 50 cudadrv\module | 11 11 cudadrv\occupancy | 1 1 cudadrv\profile | 2 2 cudadrv\stream | 7 7 cudadrv\version | 3 3 cusolver\cusparse | 84 84 device\array | 20 20 device\intrinsics | 265 265 device\pointer | 57 57 gpuarrays/math | 8 8 gpuarrays/indexing scalar | 249 249 gpuarrays/input output | 5 5 gpuarrays/value constructors | 36 36 gpuarrays/indexing multidimensional | 33 33 gpuarrays/interface | 7 7 gpuarrays/iterator constructors | 24 24 gpuarrays/uniformscaling | 56 56 gpuarrays/linear algebra | 393 393 gpuarrays/conversions | 72 72 gpuarrays/fft | 12 12 gpuarrays/constructors | 335 335 gpuarrays/random | 46 46 gpuarrays/base | 41 41 gpuarrays/mapreduce essentials | 522 522 gpuarrays/broadcasting | 155 155 gpuarrays/mapreduce derivatives | 841 841 FAILURE Error in testset execution: Error During Test at C:\Users\gzhang\.julia\packages\CUDA\dZvbp\test\execution.jl:981 Got exception outside of a @test CUDA error: too many blocks in cooperative launch (code 720, ERROR_COOPERATIVE_LAUNCH_TOO_LARGE) Stacktrace: [1] throw_api_error(::CUDA.cudaError_enum) at C:\Users\gzhang\.julia\packages\CUDA\dZvbp\lib\cudadrv\error.jl:103 [2] macro expansion at C:\Users\gzhang\.julia\packages\CUDA\dZvbp\lib\cudadrv\error.jl:110 [inlined] [3] cuLaunchCooperativeKernel(::CuFunction, ::UInt32, ::UInt32, ::UInt32, ::UInt32, ::UInt32, ::UInt32, ::Int64, ::CuStream, ::Array{Ptr{Nothing},1}) at C:\Users\gzhang\.julia\packages\CUDA\dZvbp\lib\utils\call.jl:93 [4] #594 at C:\Users\gzhang\.julia\packages\CUDA\dZvbp\lib\cudadrv\execution.jl:64 [inlined] [5] macro expansion at C:\Users\gzhang\.julia\packages\CUDA\dZvbp\lib\cudadrv\execution.jl:33 [inlined] [6] pack_arguments at C:\Users\gzhang\.julia\packages\CUDA\dZvbp\lib\cudadrv\execution.jl:9 [inlined] [7] launch(::CuFunction, ::CuDeviceArray{Float32,2,CUDA.AS.Global}, ::CuDeviceArray{Float32,2,CUDA.AS.Global}, ::CuDeviceArray{Float32,2,CUDA.AS.Global}; blocks::Int64, threads::Int64, cooperative::Bool, shmem::Int64, stream::CuStream) at C:\Users\gzhang\.julia\packages\CUDA\dZvbp\lib\cudadrv\execution.jl:62 [8] #599 at C:\Users\gzhang\.julia\packages\CUDA\dZvbp\lib\cudadrv\execution.jl:138 [inlined] [9] macro expansion at C:\Users\gzhang\.julia\packages\CUDA\dZvbp\lib\cudadrv\execution.jl:97 [inlined] [10] convert_arguments at C:\Users\gzhang\.julia\packages\CUDA\dZvbp\lib\cudadrv\execution.jl:79 [inlined] [11] #cudacall#598 at C:\Users\gzhang\.julia\packages\CUDA\dZvbp\lib\cudadrv\execution.jl:137 [inlined] [12] #cudacall#777 at C:\Users\gzhang\.julia\packages\CUDA\dZvbp\src\compiler\execution.jl:218 [inlined] [13] macro expansion at C:\Users\gzhang\.julia\packages\CUDA\dZvbp\src\compiler\execution.jl:199 [inlined] [14] call(::CUDA.HostKernel{var"#kernel_vadd#482"(),Tuple{CuDeviceArray{Float32,2,CUDA.AS.Global},CuDeviceArray{Float32,2,CUDA.AS.Global},CuDeviceArray{Float32,2,CUDA.AS.Global}}}, ::CuDeviceArray{Float32,2,CUDA.AS.Global}, ::CuDeviceArray{Float32,2,CUDA.AS.Global}, ::CuDeviceArray{Float32,2,CUDA.AS.Global}; call_kwargs::Base.Iterators.Pairs{Symbol,Integer,Tuple{Symbol,Symbol,Symbol},NamedTuple{(:cooperative, :threads, :blocks),Tuple{Bool,Int64,Int64}}}) at C:\Users\gzhang\.julia\packages\CUDA\dZvbp\src\compiler\execution.jl:170 [15] (::CUDA.HostKernel{var"#kernel_vadd#482"(),Tuple{CuDeviceArray{Float32,2,CUDA.AS.Global},CuDeviceArray{Float32,2,CUDA.AS.Global},CuDeviceArray{Float32,2,CUDA.AS.Global}}})(::CuDeviceArray{Float32,2,CUDA.AS.Global}, ::Vararg{CuDeviceArray{Float32,2,CUDA.AS.Global},N} where N; kwargs::Base.Iterators.Pairs{Symbol,Integer,Tuple{Symbol,Symbol,Symbol},NamedTuple{(:cooperative, :threads, :blocks),Tuple{Bool,Int64,Int64}}}) at C:\Users\gzhang\.julia\packages\CUDA\dZvbp\src\compiler\execution.jl:347 [16] top-level scope at C:\Users\gzhang\.julia\packages\CUDA\dZvbp\src\compiler\execution.jl:110 [17] top-level scope at C:\Users\gzhang\.julia\packages\CUDA\dZvbp\test\execution.jl:997 [18] top-level scope at D:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.5\Test\src\Test.jl:1115 [19] top-level scope at C:\Users\gzhang\.julia\packages\CUDA\dZvbp\test\execution.jl:982 [20] include(::String) at .\client.jl:457 [21] #9 at C:\Users\gzhang\.julia\packages\CUDA\dZvbp\test\runtests.jl:79 [inlined] [22] macro expansion at C:\Users\gzhang\.julia\packages\CUDA\dZvbp\test\setup.jl:44 [inlined] [23] macro expansion at D:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.5\Test\src\Test.jl:1115 [inlined] [24] macro expansion at C:\Users\gzhang\.julia\packages\CUDA\dZvbp\test\setup.jl:44 [inlined] [25] macro expansion at C:\Users\gzhang\.julia\packages\CUDA\dZvbp\src\utilities.jl:35 [inlined] [26] macro expansion at C:\Users\gzhang\.julia\packages\CUDA\dZvbp\src\pool.jl:537 [inlined] [27] top-level scope at C:\Users\gzhang\.julia\packages\CUDA\dZvbp\test\setup.jl:43 [28] eval at .\boot.jl:331 [inlined] [29] runtests(::Function, ::String, ::Symbol, ::Nothing) at C:\Users\gzhang\.julia\packages\CUDA\dZvbp\test\setup.jl:55 [30] (::Distributed.var"#106#108"{Distributed.CallMsg{:call_fetch}})() at D:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.5\Distributed\src\process_messages.jl:294 [31] run_work_thunk(::Distributed.var"#106#108"{Distributed.CallMsg{:call_fetch}}, ::Bool) at D:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.5\Distributed\src\process_messages.jl:79 [32] macro expansion at D:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.5\Distributed\src\process_messages.jl:294 [inlined] [33] (::Distributed.var"#105#107"{Distributed.CallMsg{:call_fetch},Distributed.MsgHeader,Sockets.TCPSocket})() at .\task.jl:356 ERROR: LoadError: Test run finished with errors in expression starting at C:\Users\gzhang\.julia\packages\CUDA\dZvbp\test\runtests.jl:482 ERROR: Package CUDA errored during testing
https://github.com/JuliaGPU/CUDA.jl/issues/247