nvprof runs without error and CUDA.jl gives expected behavior, but nvprof cannot see anything.
Julia environment:
(@v1.4) pkg> status
Status `~/.julia/environments/v1.4/Project.toml`
[c52e3926] Atom v0.12.19
[052768ef] CUDA v1.2.1
[e5e0dc1b] Juno v0.8.3
[14b8a8f1] PkgTemplates v0.7.8
[295af30f] Revise v2.7.3
nvprof version:
(base) au@a1:~$ nvprof --version
nvprof: NVIDIA (R) Cuda command line profiler
Copyright (c) 2012 - 2019 NVIDIA Corporation
Release version 10.1.243 (21)
script tested: scratch.jl (part of the CUDA.jl/test for mapreduce)
using Pkg
Pkg.activate("./")
using CUDA
function mapreduce_gpu(f::Function, op::Function, A::CuArray{T, N}) where {T, N}
OT = Int
v0 = 0
out = CuArray{OT}(undef, (1,))
@cuda threads=64 reduce_kernel(f, op, v0, A, out)
Array(out)[1]
end
function reduce_kernel(f, op, v0::T, A, result) where {T}
tmp_local = @cuStaticSharedMem(T, 64)
acc = v0
# Loop sequentially over chunks of input vector
i = threadIdx().x
while i <= length(A)
element = f(A[i])
acc = op(acc, element)
i += blockDim().x
end
return
end
A = rand(1:10, 100)
dA = CuArray(A)
mapreduce(identity, +, A)
result of running scratch.jl in repl:
julia> include("/mnt/evo512/insync/Software_a1/testCUDA/scratch.jl")
Activating new environment at `~/Project.toml`
502
result of running nvprof on scratch.jl:
(base) au@a1:~$ nvprof --profile-from-start off julia /mnt/evo512/insync/Software_a1/testCUDA/scratch.jl
Activating new environment at `~/~/Project.toml`
==275468== NVPROF is profiling process 275468, command: julia /mnt/evo512/insync/Software_a1/testCUDA/scratch.jl
==275468== Profiling application: julia /mnt/evo512/insync/Software_a1/testCUDA/scratch.jl
==275468== Profiling result:
No kernels were profiled.
No API activities were profiled.
nvprof runs without error and CUDA.jl gives expected behavior, but nvprof cannot see anything.
Julia environment:
nvprof version:
hardware and library support:
script tested: scratch.jl (part of the CUDA.jl/test for mapreduce)
result of running scratch.jl in repl:
result of running nvprof on scratch.jl:
expected result is something along the lines of CUDA.jl Introduction to profiling: