Closed CharlesRSmith44 closed 3 years ago
Alternatively, I can bind arrays as CuArrays after drawing the random variables, but the time cost of doing so seems prohibitive. Instead, it seems much faster to avoid using the GPU and use the CPU exclusively. Is there anyway to decrease the time fo the gpu option? I.e. could I draw random variables into the GPU initially rather than having to transfer them over? I think the transfering from the gpu to the cpu is what makes the code so slow for the gpu option.
For example:
import Pkg
Pkg.activate("joint_timing")
Pkg.instantiate()
using Cuba, Distributions
using BenchmarkTools, Test, CUDA
using FLoops, FoldsCUDA
using SpecialFunctions
@test Threads.nthreads()>1
# User Inputs
M= 5 # number of independent uniform random variables
atol=1e-6
rtol=1e-3
nvec=1000000
maxevals=100000000
# Initializing Functions
function int_cpu(x, f)
f[1] = pdf(Product(Beta.(1.0,2.0*ones(M))),x)
end
function int_cpu2(x, f)
f[1] = vec(prod(x'.^(1.0-1.0) .* (1.0 .- x').^(2.0-1.0)./(gamma(1.0)*gamma(2.0)/gamma(3.0)),dims=2))[1]
end
function beta_pdf_gpu(x, a, b)
prod(x.^(a-1.0f0) .* (1.0f0 .- x).^(b-1.0f0)./(gamma(a)*gamma(b)/gamma(a+b)),dims=1)
end
function int_gpu(x, f)
f[1] = vec(beta_pdf_gpu(CuArray(x),1.0f0,2.0f0))[1]
end
display(@benchmark cuhre($int_gpu, $M, 1, atol=$atol, rtol=$rtol)) # 70 ms for M = 5, 11.7 s for M = 15)
display(@benchmark cuhre($int_cpu, $M, 1, atol=$atol, rtol=$rtol)) # (2.0 ms for M = 5, 650ms for M=15)
display(@benchmark cuhre($int_cpu2, $M, 1, atol=$atol, rtol=$rtol)) # (500 mus for M = 5, 100ms for M = 15, 38s for M = 25)
I'm not really sure what you want to do here. Cuba.jl
calls into a C shared library which runs on the CPU, so you're bound to have to constantly moving data between GPU and CPU, there is really no way around it, and whether this is worth or not depends on your specific problem, but in vast majority of (or all?) cases I'd assume it isn't. I'm going to close this ticket because I don't think there is any action to take here.
It may be interesting to see whether HCubature.jl
works out-of-the-box with CuArray
s: that's a pure-Julia package which implements the same algorithm as the cuhre
function here
Thank you for the quick response! I will look into that.
Hello,
I'm trying to utilize gpu computation using Cuda.jl to speed up calculating integrals. Is it possible to do so? If so, how? Here is my example code:
If I use the CUDAEx() call, the code errors. If I don't the code works fine, but isn't exploiting the GPU effectively.
If I include the CUDAEx() call, the error message is