JuliaGPU / CUDA.jl

CUDA programming in Julia.
https://juliagpu.org/cuda/
Other
1.21k stars 221 forks source link

test cuda FAILURE #496

Closed gzhang closed 4 years ago

gzhang commented 4 years ago
julia> using CUDA

julia>  CUDA.versioninfo()
CUDA toolkit 11.0.3, artifact installation
CUDA driver 11.1.0
NVIDIA driver 456.43.0

Libraries:
- CUBLAS: 11.2.0
- CURAND: 10.2.1
- CUFFT: 10.2.1
- CUSOLVER: 10.6.0
- CUSPARSE: 11.1.1
- CUPTI: 13.0.0
- NVML: 11.0.0+456.43
- CUDNN: 8.0.2 (for CUDA 11.0.0)
- CUTENSOR: 1.2.0 (for CUDA 11.0.0)

Toolchain:
- Julia: 1.5.2
- LLVM: 9.0.1
- PTX ISA support: 3.2, 4.0, 4.1, 4.2, 4.3, 5.0, 6.0, 6.1, 6.3, 6.4
- Device support: sm_35, sm_37, sm_50, sm_52, sm_53, sm_60, sm_61, sm_62, sm_70, sm_72, sm_75

1 device:
  0: Quadro P2000 (sm_61, 3.708 GiB / 4.000 GiB available)

(@v1.5) pkg> test CUDA
    Testing CUDA
Status `C:\Users\gzhang\AppData\Local\Temp\jl_E27BAs\Project.toml`
  [621f4979] AbstractFFTs v0.5.0
  [79e6a3ab] Adapt v2.3.0
  [b99e7846] BinaryProvider v0.5.10
  [fa961155] CEnum v0.4.1
  [052768ef] CUDA v1.3.3
  [864edb3b] DataStructures v0.17.20
  [e2ba6199] ExprTools v0.1.3
  [7a1cc6ca] FFTW v1.2.4
  [1a297f60] FillArrays v0.8.14
  [f6369f11] ForwardDiff v0.10.12
  [0c68f7d7] GPUArrays v5.2.1
  [61eb1bfa] GPUCompiler v0.6.1
  [929cbde3] LLVM v2.0.0
  [1914dd2f] MacroTools v0.5.5
  [872c559c] NNlib v0.7.5
  [189a3867] Reexport v0.2.0
  [ae029012] Requires v1.1.0
  [a759f4b9] TimerOutputs v0.5.6
  [ade2ca70] Dates
  [8ba89e20] Distributed
  [8f399da3] Libdl
  [37e2e46d] LinearAlgebra
  [56ddb016] Logging
  [44cfe95a] Pkg
  [de0858da] Printf
  [3fa0cd96] REPL
  [9a3f8284] Random
  [2f01184e] SparseArrays
  [10745b16] Statistics
  [8dfed614] Test
Status `C:\Users\gzhang\AppData\Local\Temp\jl_E27BAs\Manifest.toml`
  [621f4979] AbstractFFTs v0.5.0
  [79e6a3ab] Adapt v2.3.0
  [b99e7846] BinaryProvider v0.5.10
  [fa961155] CEnum v0.4.1
  [052768ef] CUDA v1.3.3
  [bbf7d656] CommonSubexpressions v0.3.0
  [e66e0078] CompilerSupportLibraries_jll v0.3.3+0
  [864edb3b] DataStructures v0.17.20
  [163ba53b] DiffResults v1.0.2
  [b552c78f] DiffRules v1.0.1
  [e2ba6199] ExprTools v0.1.3
  [7a1cc6ca] FFTW v1.2.4
  [f5851436] FFTW_jll v3.3.9+5
  [1a297f60] FillArrays v0.8.14
  [f6369f11] ForwardDiff v0.10.12
  [0c68f7d7] GPUArrays v5.2.1
  [61eb1bfa] GPUCompiler v0.6.1
  [1d5cc7b8] IntelOpenMP_jll v2018.0.3+0
  [929cbde3] LLVM v2.0.0
  [856f044c] MKL_jll v2020.2.254+0
  [1914dd2f] MacroTools v0.5.5
  [872c559c] NNlib v0.7.5
  [77ba4419] NaNMath v0.3.4
  [efe28fd5] OpenSpecFun_jll v0.5.3+3
  [bac558e1] OrderedCollections v1.3.1
  [189a3867] Reexport v0.2.0
  [ae029012] Requires v1.1.0
  [276daf66] SpecialFunctions v0.10.3
  [90137ffa] StaticArrays v0.12.4
  [a759f4b9] TimerOutputs v0.5.6
  [2a0f44e3] Base64
  [ade2ca70] Dates
  [8ba89e20] Distributed
  [b77e0a4c] InteractiveUtils
  [76f85450] LibGit2
  [8f399da3] Libdl
  [37e2e46d] LinearAlgebra
  [56ddb016] Logging
  [d6f4376e] Markdown
  [44cfe95a] Pkg
  [de0858da] Printf
  [3fa0cd96] REPL
  [9a3f8284] Random
  [ea8e919c] SHA
  [9e88b42a] Serialization
  [6462fe0b] Sockets
  [2f01184e] SparseArrays
  [10745b16] Statistics
  [8dfed614] Test
  [cf7118a7] UUIDs
  [4ec0a83e] Unicode
┌ Info: System information:
│ CUDA toolkit 11.0.3, artifact installation
│ CUDA driver 11.1.0
│ NVIDIA driver 456.43.0
│
│ Libraries:
│ - CUBLAS: 11.2.0
│ - CURAND: 10.2.1
│ - CUFFT: 10.2.1
│ - CUSOLVER: 10.6.0
│ - CUSPARSE: 11.1.1
│ - CUPTI: 13.0.0
│ - NVML: 11.0.0+456.43
│ - CUDNN: 8.0.2 (for CUDA 11.0.0)
│ - CUTENSOR: 1.2.0 (for CUDA 11.0.0)
│
│ Toolchain:
│ - Julia: 1.5.2
│ - LLVM: 9.0.1
│ - PTX ISA support: 3.2, 4.0, 4.1, 4.2, 4.3, 5.0, 6.0, 6.1, 6.3, 6.4
│ - Device support: sm_35, sm_37, sm_50, sm_52, sm_53, sm_60, sm_61, sm_62, sm_70, sm_72, sm_75
│
│ 1 device:
└   0: Quadro P2000 (sm_61, 3.719 GiB / 4.000 GiB available)
[ Info: Testing using 1 device(s): 1. Quadro P2000 (UUID 9b0b39dd-2ad4-66d0-d456-01bb0741d565)
[ Info: Skipping the following tests: cutensor, device\wmma
                                         |          | ---------------- GPU ---------------- | ---------------- CPU ---------------- |
Test                            (Worker) | Time (s) | GC (s) | GC % | Alloc (MB) | RSS (MB) | GC (s) | GC % | Alloc (MB) | RSS (MB) |
initialization                       (2) |    15.22 |   0.00 |  0.0 |       0.00 |      N/A |   0.15 |  1.0 |     222.34 |   542.11 |
apiutils                             (2) |     1.09 |   0.00 |  0.0 |       0.00 |      N/A |   0.00 |  0.0 |       5.37 |   542.11 |
array                                (2) |   250.97 |   0.60 |  0.2 |       5.20 |      N/A |   6.77 |  2.7 |    6322.69 |   769.18 |
broadcast                            (2) |    96.12 |   0.00 |  0.0 |       0.00 |      N/A |   1.43 |  1.5 |    1495.83 |   811.03 |
codegen                              (2) |    18.79 |   0.00 |  0.0 |       0.00 |      N/A |   0.28 |  1.5 |     287.59 |   811.03 |
cublas                               (2) |   291.38 |   0.10 |  0.0 |      11.12 |      N/A |   6.73 |  2.3 |    6636.25 |  1231.46 |
cudnn                                (2) |   241.04 |   0.01 |  0.0 |       0.60 |      N/A |   4.94 |  2.1 |    5065.72 |  1597.04 |
cufft                                (2) |    81.18 |   0.07 |  0.1 |     133.23 |      N/A |   1.79 |  2.2 |    1934.60 |  1597.04 |
curand                               (2) |     0.41 |   0.00 |  0.0 |       0.00 |      N/A |   0.00 |  0.0 |       5.49 |  1597.04 |
cusolver                             (2) |   190.30 |   0.23 |  0.1 |    1229.85 |      N/A |   3.90 |  2.0 |    4426.17 |  1950.82 |
cusparse                             (2) |   114.74 |   0.03 |  0.0 |       4.46 |      N/A |   1.74 |  1.5 |    2043.30 |  1950.82 |
examples                             (2) |   501.67 |   0.00 |  0.0 |       0.00 |      N/A |   0.08 |  0.0 |      23.86 |  1950.82 |
exceptions                           (2) |   375.83 |   0.00 |  0.0 |       0.00 |      N/A |   0.00 |  0.0 |      23.78 |  1950.82 |
execution                            (2) |         failed at 2020-10-16T15:25:31.225
forwarddiff                          (3) |   311.40 |   0.23 |  0.1 |       0.00 |      N/A |   4.52 |  1.5 |    3907.68 |   657.59 |
iterator                             (3) |     9.98 |   0.00 |  0.0 |       1.07 |      N/A |   0.29 |  2.9 |     257.28 |   657.59 |
nnlib                                (3) |    11.04 |   0.00 |  0.0 |       0.00 |      N/A |   0.20 |  1.8 |     199.02 |   878.17 |
nvml                                 (3) |     2.45 |   0.00 |  0.0 |       0.00 |      N/A |   0.06 |  2.5 |      44.01 |   878.17 |
nvtx                                 (3) |     4.71 |   0.00 |  0.0 |       0.00 |      N/A |   0.07 |  1.4 |      72.40 |   878.17 |
pointer                              (3) |     0.63 |   0.00 |  0.0 |       0.00 |      N/A |   0.00 |  0.0 |       7.71 |   878.17 |
pool                                 (3) |     9.16 |   0.00 |  0.0 |       0.00 |      N/A |   0.89 |  9.7 |     157.98 |   878.17 |
random                               (3) |    35.16 |   0.01 |  0.0 |       0.02 |      N/A |   0.58 |  1.7 |     803.79 |   878.17 |
statistics                           (3) |    54.18 |   0.00 |  0.0 |       0.00 |      N/A |   1.18 |  2.2 |    1324.56 |   878.17 |
texture                              (3) |    86.61 |   0.00 |  0.0 |       0.08 |      N/A |   2.41 |  2.8 |    2392.62 |   878.17 |
threading                            (3) |    17.60 |   0.01 |  0.1 |      10.94 |      N/A |   0.44 |  2.5 |     366.53 |   979.98 |
utils                                (3) |     4.11 |   0.00 |  0.0 |       0.00 |      N/A |   0.11 |  2.7 |     117.91 |   979.98 |
cudadrv\context                      (3) |     2.37 |   0.00 |  0.0 |       0.00 |      N/A |   0.05 |  2.2 |      62.36 |   979.98 |
cudadrv\devices                      (3) |     1.15 |   0.00 |  0.0 |       0.00 |      N/A |   0.00 |  0.0 |      29.99 |   979.98 |
cudadrv\errors                       (3) |     0.74 |   0.00 |  0.0 |       0.00 |      N/A |   0.05 |  7.1 |      28.14 |   979.98 |
cudadrv\events                       (3) |     0.75 |   0.00 |  0.0 |       0.00 |      N/A |   0.00 |  0.0 |      31.26 |   979.98 |
cudadrv\execution                    (3) |     3.88 |   0.00 |  0.0 |       0.00 |      N/A |   0.10 |  2.7 |     102.47 |   979.98 |
cudadrv\memory                       (3) |     9.49 |   0.00 |  0.0 |       0.00 |      N/A |   0.21 |  2.2 |     234.28 |   979.98 |
cudadrv\module                       (3) |     1.77 |   0.00 |  0.0 |       0.00 |      N/A |   0.00 |  0.0 |      29.90 |   979.98 |
cudadrv\occupancy                    (3) |     0.51 |   0.00 |  0.0 |       0.00 |      N/A |   0.00 |  0.0 |      12.57 |   979.98 |
cudadrv\profile                      (3) |     1.48 |   0.00 |  0.0 |       0.00 |      N/A |   0.10 |  7.1 |      60.59 |   979.98 |
cudadrv\stream                       (3) |     0.93 |   0.00 |  0.0 |       0.00 |      N/A |   0.00 |  0.0 |      41.43 |   979.98 |
cudadrv\version                      (3) |     0.04 |   0.00 |  0.0 |       0.00 |      N/A |   0.00 |  0.0 |       0.07 |   979.98 |
cusolver\cusparse                    (3) |    73.94 |   0.01 |  0.0 |       0.19 |      N/A |   2.58 |  3.5 |    1705.97 |  1297.63 |
device\array                         (3) |    11.65 |   0.00 |  0.0 |       0.00 |      N/A |   0.19 |  1.6 |     206.23 |  1297.63 |
device\intrinsics                    (3) |   365.43 |   0.01 |  0.0 |       0.01 |      N/A |   3.85 |  1.1 |    4281.75 |  1297.63 |
device\pointer                       (3) |    24.92 |   0.00 |  0.0 |       0.00 |      N/A |   0.37 |  1.5 |     471.82 |  1297.63 |
gpuarrays/math                       (3) |     9.07 |   0.00 |  0.0 |       0.00 |      N/A |   0.15 |  1.6 |     192.30 |  1297.63 |
gpuarrays/indexing scalar            (3) |    21.53 |   0.00 |  0.0 |       0.00 |      N/A |   0.45 |  2.1 |     417.78 |  1297.63 |
gpuarrays/input output               (3) |     5.41 |   0.00 |  0.0 |       0.00 |      N/A |   0.13 |  2.4 |     133.80 |  1297.63 |
gpuarrays/value constructors         (3) |    24.64 |   0.00 |  0.0 |       0.00 |      N/A |   0.36 |  1.5 |     449.16 |  1297.63 |
gpuarrays/indexing multidimensional  (3) |    93.87 |   0.01 |  0.0 |       0.71 |      N/A |   1.95 |  2.1 |    2098.38 |  1297.63 |
gpuarrays/interface                  (3) |    10.15 |   0.00 |  0.0 |       0.00 |      N/A |   0.25 |  2.4 |     195.66 |  1297.63 |
gpuarrays/iterator constructors      (3) |    24.23 |   0.00 |  0.0 |       0.02 |      N/A |   0.36 |  1.5 |     452.58 |  1297.63 |
gpuarrays/uniformscaling             (3) |    28.36 |   0.01 |  0.0 |       0.01 |      N/A |   0.55 |  1.9 |     498.21 |  1297.63 |
gpuarrays/linear algebra             (3) |   253.48 |   0.02 |  0.0 |       1.42 |      N/A |   4.31 |  1.7 |    4325.65 |  1561.33 |
gpuarrays/conversions                (3) |    11.34 |   0.00 |  0.0 |       0.01 |      N/A |   0.28 |  2.4 |     344.24 |  1561.33 |
gpuarrays/fft                        (3) |    22.77 |   0.01 |  0.0 |       6.01 |      N/A |   0.63 |  2.8 |     558.29 |  1657.55 |
gpuarrays/constructors               (3) |     4.81 |   0.01 |  0.2 |       0.03 |      N/A |   0.07 |  1.5 |      75.54 |  1657.55 |
gpuarrays/random                     (3) |    94.39 |   0.01 |  0.0 |       0.03 |      N/A |   1.39 |  1.5 |    1610.47 |  1657.55 |
gpuarrays/base                       (3) |    56.48 |   0.01 |  0.0 |      17.44 |      N/A |   1.46 |  2.6 |    1509.64 |  1657.55 |
gpuarrays/mapreduce essentials       (3) |   487.60 |   0.02 |  0.0 |       3.19 |      N/A |  10.14 |  2.1 |   10843.78 |  1793.80 |
gpuarrays/broadcasting               (3) |   255.73 |   0.01 |  0.0 |       1.19 |      N/A |   6.67 |  2.6 |    5621.06 |  1793.80 |
gpuarrays/mapreduce derivatives      (3) |   704.05 |   0.04 |  0.0 |       3.06 |      N/A |  12.11 |  1.7 |   12897.46 |  1851.68 |
Worker 2 failed running test execution:
Some tests did not pass: 65 passed, 0 failed, 1 errored, 0 broken.
execution: Error During Test at C:\Users\gzhang\.julia\packages\CUDA\dZvbp\test\execution.jl:981
  Got exception outside of a @test
  CUDA error: too many blocks in cooperative launch (code 720, ERROR_COOPERATIVE_LAUNCH_TOO_LARGE)
  Stacktrace:
   [1] throw_api_error(::CUDA.cudaError_enum) at C:\Users\gzhang\.julia\packages\CUDA\dZvbp\lib\cudadrv\error.jl:103
   [2] macro expansion at C:\Users\gzhang\.julia\packages\CUDA\dZvbp\lib\cudadrv\error.jl:110 [inlined]
   [3] cuLaunchCooperativeKernel(::CuFunction, ::UInt32, ::UInt32, ::UInt32, ::UInt32, ::UInt32, ::UInt32, ::Int64, ::CuStream, ::Array{Ptr{Nothing},1}) at C:\Users\gzhang\.julia\packages\CUDA\dZvbp\lib\utils\call.jl:93
   [4] #594 at C:\Users\gzhang\.julia\packages\CUDA\dZvbp\lib\cudadrv\execution.jl:64 [inlined]
   [5] macro expansion at C:\Users\gzhang\.julia\packages\CUDA\dZvbp\lib\cudadrv\execution.jl:33 [inlined]
   [6] pack_arguments at C:\Users\gzhang\.julia\packages\CUDA\dZvbp\lib\cudadrv\execution.jl:9 [inlined]
   [7] launch(::CuFunction, ::CuDeviceArray{Float32,2,CUDA.AS.Global}, ::CuDeviceArray{Float32,2,CUDA.AS.Global}, ::CuDeviceArray{Float32,2,CUDA.AS.Global}; blocks::Int64, threads::Int64, cooperative::Bool, shmem::Int64, stream::CuStream) at C:\Users\gzhang\.julia\packages\CUDA\dZvbp\lib\cudadrv\execution.jl:62
   [8] #599 at C:\Users\gzhang\.julia\packages\CUDA\dZvbp\lib\cudadrv\execution.jl:138 [inlined]
   [9] macro expansion at C:\Users\gzhang\.julia\packages\CUDA\dZvbp\lib\cudadrv\execution.jl:97 [inlined]
   [10] convert_arguments at C:\Users\gzhang\.julia\packages\CUDA\dZvbp\lib\cudadrv\execution.jl:79 [inlined]
   [11] #cudacall#598 at C:\Users\gzhang\.julia\packages\CUDA\dZvbp\lib\cudadrv\execution.jl:137 [inlined]
   [12] #cudacall#777 at C:\Users\gzhang\.julia\packages\CUDA\dZvbp\src\compiler\execution.jl:218 [inlined]
   [13] macro expansion at C:\Users\gzhang\.julia\packages\CUDA\dZvbp\src\compiler\execution.jl:199 [inlined]
   [14] call(::CUDA.HostKernel{var"#kernel_vadd#482"(),Tuple{CuDeviceArray{Float32,2,CUDA.AS.Global},CuDeviceArray{Float32,2,CUDA.AS.Global},CuDeviceArray{Float32,2,CUDA.AS.Global}}}, ::CuDeviceArray{Float32,2,CUDA.AS.Global}, ::CuDeviceArray{Float32,2,CUDA.AS.Global}, ::CuDeviceArray{Float32,2,CUDA.AS.Global}; call_kwargs::Base.Iterators.Pairs{Symbol,Integer,Tuple{Symbol,Symbol,Symbol},NamedTuple{(:cooperative, :threads, :blocks),Tuple{Bool,Int64,Int64}}}) at C:\Users\gzhang\.julia\packages\CUDA\dZvbp\src\compiler\execution.jl:170
   [15] (::CUDA.HostKernel{var"#kernel_vadd#482"(),Tuple{CuDeviceArray{Float32,2,CUDA.AS.Global},CuDeviceArray{Float32,2,CUDA.AS.Global},CuDeviceArray{Float32,2,CUDA.AS.Global}}})(::CuDeviceArray{Float32,2,CUDA.AS.Global}, ::Vararg{CuDeviceArray{Float32,2,CUDA.AS.Global},N} where N; kwargs::Base.Iterators.Pairs{Symbol,Integer,Tuple{Symbol,Symbol,Symbol},NamedTuple{(:cooperative, :threads, :blocks),Tuple{Bool,Int64,Int64}}}) at C:\Users\gzhang\.julia\packages\CUDA\dZvbp\src\compiler\execution.jl:347
   [16] top-level scope at C:\Users\gzhang\.julia\packages\CUDA\dZvbp\src\compiler\execution.jl:110
   [17] top-level scope at C:\Users\gzhang\.julia\packages\CUDA\dZvbp\test\execution.jl:997
   [18] top-level scope at D:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.5\Test\src\Test.jl:1115
   [19] top-level scope at C:\Users\gzhang\.julia\packages\CUDA\dZvbp\test\execution.jl:982
   [20] include(::String) at .\client.jl:457
   [21] #9 at C:\Users\gzhang\.julia\packages\CUDA\dZvbp\test\runtests.jl:79 [inlined]
   [22] macro expansion at C:\Users\gzhang\.julia\packages\CUDA\dZvbp\test\setup.jl:44 [inlined]
   [23] macro expansion at D:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.5\Test\src\Test.jl:1115 [inlined]
   [24] macro expansion at C:\Users\gzhang\.julia\packages\CUDA\dZvbp\test\setup.jl:44 [inlined]
   [25] macro expansion at C:\Users\gzhang\.julia\packages\CUDA\dZvbp\src\utilities.jl:35 [inlined]
   [26] macro expansion at C:\Users\gzhang\.julia\packages\CUDA\dZvbp\src\pool.jl:537 [inlined]
   [27] top-level scope at C:\Users\gzhang\.julia\packages\CUDA\dZvbp\test\setup.jl:43
   [28] eval at .\boot.jl:331 [inlined]
   [29] runtests(::Function, ::String, ::Symbol, ::Nothing) at C:\Users\gzhang\.julia\packages\CUDA\dZvbp\test\setup.jl:55
   [30] (::Distributed.var"#106#108"{Distributed.CallMsg{:call_fetch}})() at D:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.5\Distributed\src\process_messages.jl:294
   [31] run_work_thunk(::Distributed.var"#106#108"{Distributed.CallMsg{:call_fetch}}, ::Bool) at D:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.5\Distributed\src\process_messages.jl:79
   [32] macro expansion at D:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.5\Distributed\src\process_messages.jl:294 [inlined]
   [33] (::Distributed.var"#105#107"{Distributed.CallMsg{:call_fetch},Distributed.MsgHeader,Sockets.TCPSocket})() at .\task.jl:356

Test Summary:                         | Pass  Error  Broken  Total
  Overall                             | 8147      1       2   8150
    initialization                    |   25                    25
    apiutils                          |   15                    15
    array                             |  156                   156
    broadcast                         |   29                    29
    codegen                           |   18                    18
    cublas                            | 1881                  1881
    cudnn                             |  141                   141
    cufft                             |  151                   151
    curand                            |    1                     1
    cusolver                          | 1492                  1492
    cusparse                          |  453                   453
    examples                          |    7                     7
    exceptions                        |   17                    17
    execution                         |   65      1             66
    forwarddiff                       |  107                   107
    iterator                          |   30                    30
    nnlib                             |    4                     4
    nvml                              |    7                     7
    nvtx                              |                      No tests
    pointer                           |   13                    13
    pool                              |   10                    10
    random                            |  101                   101
    statistics                        |   14                    14
    texture                           |   26              1     27
    threading                         |                      No tests
    utils                             |    5                     5
    cudadrv\context                   |   12                    12
    cudadrv\devices                   |    6                     6
    cudadrv\errors                    |    6                     6
    cudadrv\events                    |    6                     6
    cudadrv\execution                 |   15                    15
    cudadrv\memory                    |   49              1     50
    cudadrv\module                    |   11                    11
    cudadrv\occupancy                 |    1                     1
    cudadrv\profile                   |    2                     2
    cudadrv\stream                    |    7                     7
    cudadrv\version                   |    3                     3
    cusolver\cusparse                 |   84                    84
    device\array                      |   20                    20
    device\intrinsics                 |  265                   265
    device\pointer                    |   57                    57
    gpuarrays/math                    |    8                     8
    gpuarrays/indexing scalar         |  249                   249
    gpuarrays/input output            |    5                     5
    gpuarrays/value constructors      |   36                    36
    gpuarrays/indexing multidimensional |   33                    33
    gpuarrays/interface               |    7                     7
    gpuarrays/iterator constructors   |   24                    24
    gpuarrays/uniformscaling          |   56                    56
    gpuarrays/linear algebra          |  393                   393
    gpuarrays/conversions             |   72                    72
    gpuarrays/fft                     |   12                    12
    gpuarrays/constructors            |  335                   335
    gpuarrays/random                  |   46                    46
    gpuarrays/base                    |   41                    41
    gpuarrays/mapreduce essentials    |  522                   522
    gpuarrays/broadcasting            |  155                   155
    gpuarrays/mapreduce derivatives   |  841                   841
    FAILURE

Error in testset execution:
Error During Test at C:\Users\gzhang\.julia\packages\CUDA\dZvbp\test\execution.jl:981
  Got exception outside of a @test
  CUDA error: too many blocks in cooperative launch (code 720, ERROR_COOPERATIVE_LAUNCH_TOO_LARGE)
  Stacktrace:
   [1] throw_api_error(::CUDA.cudaError_enum) at C:\Users\gzhang\.julia\packages\CUDA\dZvbp\lib\cudadrv\error.jl:103
   [2] macro expansion at C:\Users\gzhang\.julia\packages\CUDA\dZvbp\lib\cudadrv\error.jl:110 [inlined]
   [3] cuLaunchCooperativeKernel(::CuFunction, ::UInt32, ::UInt32, ::UInt32, ::UInt32, ::UInt32, ::UInt32, ::Int64, ::CuStream, ::Array{Ptr{Nothing},1}) at C:\Users\gzhang\.julia\packages\CUDA\dZvbp\lib\utils\call.jl:93
   [4] #594 at C:\Users\gzhang\.julia\packages\CUDA\dZvbp\lib\cudadrv\execution.jl:64 [inlined]
   [5] macro expansion at C:\Users\gzhang\.julia\packages\CUDA\dZvbp\lib\cudadrv\execution.jl:33 [inlined]
   [6] pack_arguments at C:\Users\gzhang\.julia\packages\CUDA\dZvbp\lib\cudadrv\execution.jl:9 [inlined]
   [7] launch(::CuFunction, ::CuDeviceArray{Float32,2,CUDA.AS.Global}, ::CuDeviceArray{Float32,2,CUDA.AS.Global}, ::CuDeviceArray{Float32,2,CUDA.AS.Global}; blocks::Int64, threads::Int64, cooperative::Bool, shmem::Int64, stream::CuStream) at C:\Users\gzhang\.julia\packages\CUDA\dZvbp\lib\cudadrv\execution.jl:62
   [8] #599 at C:\Users\gzhang\.julia\packages\CUDA\dZvbp\lib\cudadrv\execution.jl:138 [inlined]
   [9] macro expansion at C:\Users\gzhang\.julia\packages\CUDA\dZvbp\lib\cudadrv\execution.jl:97 [inlined]
   [10] convert_arguments at C:\Users\gzhang\.julia\packages\CUDA\dZvbp\lib\cudadrv\execution.jl:79 [inlined]
   [11] #cudacall#598 at C:\Users\gzhang\.julia\packages\CUDA\dZvbp\lib\cudadrv\execution.jl:137 [inlined]
   [12] #cudacall#777 at C:\Users\gzhang\.julia\packages\CUDA\dZvbp\src\compiler\execution.jl:218 [inlined]
   [13] macro expansion at C:\Users\gzhang\.julia\packages\CUDA\dZvbp\src\compiler\execution.jl:199 [inlined]
   [14] call(::CUDA.HostKernel{var"#kernel_vadd#482"(),Tuple{CuDeviceArray{Float32,2,CUDA.AS.Global},CuDeviceArray{Float32,2,CUDA.AS.Global},CuDeviceArray{Float32,2,CUDA.AS.Global}}}, ::CuDeviceArray{Float32,2,CUDA.AS.Global}, ::CuDeviceArray{Float32,2,CUDA.AS.Global}, ::CuDeviceArray{Float32,2,CUDA.AS.Global}; call_kwargs::Base.Iterators.Pairs{Symbol,Integer,Tuple{Symbol,Symbol,Symbol},NamedTuple{(:cooperative, :threads, :blocks),Tuple{Bool,Int64,Int64}}}) at C:\Users\gzhang\.julia\packages\CUDA\dZvbp\src\compiler\execution.jl:170
   [15] (::CUDA.HostKernel{var"#kernel_vadd#482"(),Tuple{CuDeviceArray{Float32,2,CUDA.AS.Global},CuDeviceArray{Float32,2,CUDA.AS.Global},CuDeviceArray{Float32,2,CUDA.AS.Global}}})(::CuDeviceArray{Float32,2,CUDA.AS.Global}, ::Vararg{CuDeviceArray{Float32,2,CUDA.AS.Global},N} where N; kwargs::Base.Iterators.Pairs{Symbol,Integer,Tuple{Symbol,Symbol,Symbol},NamedTuple{(:cooperative, :threads, :blocks),Tuple{Bool,Int64,Int64}}}) at C:\Users\gzhang\.julia\packages\CUDA\dZvbp\src\compiler\execution.jl:347
   [16] top-level scope at C:\Users\gzhang\.julia\packages\CUDA\dZvbp\src\compiler\execution.jl:110
   [17] top-level scope at C:\Users\gzhang\.julia\packages\CUDA\dZvbp\test\execution.jl:997
   [18] top-level scope at D:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.5\Test\src\Test.jl:1115
   [19] top-level scope at C:\Users\gzhang\.julia\packages\CUDA\dZvbp\test\execution.jl:982
   [20] include(::String) at .\client.jl:457
   [21] #9 at C:\Users\gzhang\.julia\packages\CUDA\dZvbp\test\runtests.jl:79 [inlined]
   [22] macro expansion at C:\Users\gzhang\.julia\packages\CUDA\dZvbp\test\setup.jl:44 [inlined]
   [23] macro expansion at D:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.5\Test\src\Test.jl:1115 [inlined]
   [24] macro expansion at C:\Users\gzhang\.julia\packages\CUDA\dZvbp\test\setup.jl:44 [inlined]
   [25] macro expansion at C:\Users\gzhang\.julia\packages\CUDA\dZvbp\src\utilities.jl:35 [inlined]
   [26] macro expansion at C:\Users\gzhang\.julia\packages\CUDA\dZvbp\src\pool.jl:537 [inlined]
   [27] top-level scope at C:\Users\gzhang\.julia\packages\CUDA\dZvbp\test\setup.jl:43
   [28] eval at .\boot.jl:331 [inlined]
   [29] runtests(::Function, ::String, ::Symbol, ::Nothing) at C:\Users\gzhang\.julia\packages\CUDA\dZvbp\test\setup.jl:55
   [30] (::Distributed.var"#106#108"{Distributed.CallMsg{:call_fetch}})() at D:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.5\Distributed\src\process_messages.jl:294
   [31] run_work_thunk(::Distributed.var"#106#108"{Distributed.CallMsg{:call_fetch}}, ::Bool) at D:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.5\Distributed\src\process_messages.jl:79
   [32] macro expansion at D:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.5\Distributed\src\process_messages.jl:294 [inlined]
   [33] (::Distributed.var"#105#107"{Distributed.CallMsg{:call_fetch},Distributed.MsgHeader,Sockets.TCPSocket})() at .\task.jl:356

ERROR: LoadError: Test run finished with errors
in expression starting at C:\Users\gzhang\.julia\packages\CUDA\dZvbp\test\runtests.jl:482
ERROR: Package CUDA errored during testing
maleadt commented 4 years ago

https://github.com/JuliaGPU/CUDA.jl/issues/247