I am currently running Julia with Threads.nthreads()=10, however @tturbo does not use more than one thread in the code below.
The code gets a nice speed up using @turbo, however it is still faster to use Threads.@threads, I was hoping to get the best of both worlds with @tturbo.
using LoopVectorization,Interpolations
NK = 150
NB = 200
nA = 100
nZ = 30
BM = range(-5.0,stop=5,length = NB);
KM = range(0.0,stop=10,length = NK);
Z = range(-1,stop = 1,length= nZ);
A = range(0.0,stop=10,length = nA);
pdfZ = rand(nZ,nZ) ;
AK1 = rand(1,nA,NK);
DD = zeros(nZ,nA,NK,NB);
EE = zeros(nZ,nA,NK,NB);
O = zeros(nZ,nA,NK,NB);
π1 = rand(nZ,NK,NB) .+1;
Q = rand(nZ,NK,NB).+1;
W = rand(nZ,nA) .+1;
qinterp = extrapolate(Interpolations.scale(interpolate(Q, BSpline(Linear())), Z, KM,BM), Interpolations.Line());
πinterp = extrapolate(Interpolations.scale(interpolate(π1, BSpline(Linear())), Z, KM,BM), Interpolations.Line());
winterp = extrapolate(Interpolations.scale(interpolate(W, BSpline(Linear())), Z, A), Interpolations.Line());
@tturbo for ib1 ∈ eachindex(BM) , ik1 ∈ eachindex(KM),ia ∈ eachindex(A),iz ∈ eachindex(Z)
DD[iz,ia,ik1,ib1] = AK1[1,ia,ik1] + qinterp(Z[iz],KM[ik1],BM[ib1]) * BM[ib1]
for iz1 ∈ eachindex(Z)
EE[iz,ia,ik1,ib1] += pdfZ[iz, iz1] * winterp(Z[iz1], πinterp(Z[iz1],KM[ik1],BM[ib1]))
end
O[iz,ia,ik1,ib1] = -1/max(DD[iz,ia,ik1,ib1], eps()) +(1.0 / (1.0 + 0.1)) * EE[iz,ia,ik1,ib1]
end
I am currently running Julia with Threads.nthreads()=10, however @tturbo does not use more than one thread in the code below. The code gets a nice speed up using @turbo, however it is still faster to use Threads.@threads, I was hoping to get the best of both worlds with @tturbo.