Closed gasagna closed 2 years ago
Replaced the views with inbounds as follows
@inbounds begin
for nt in 1:Nt, nz in 1:((Nz >> 1) + 1), ny in 1:Ny
dudz[ny, nz, nt] = (1im*(nz - 1)*β)*u[ny, nz, nt]
end
end
Increase of speed from
206.303 μs (0 allocations: 0 bytes)
140.275 μs (0 allocations: 0 bytes)
190.458 μs (0 allocations: 0 bytes)
to
149.189 μs (0 allocations: 0 bytes)
122.353 μs (0 allocations: 0 bytes)
149.763 μs (0 allocations: 0 bytes)
I'll keep this issue open for now to deal with @turbo
at a later date.
I have tried LoopVectorization.@turbo
, but it cannot parse the for loop. I suppose this issue can be closed.
https://github.com/The-ReSolver/Fields.jl/blob/70e78c98be6fb942e7cefb6b75d85d47981708e9/src/derivatives.jl#L61
Use three nested loops and add
@inbounds
to the outermost loop. Same for the time derivatives.Might also consider using
@turbo
from LoopVectorisation.jl.