Open platawiec opened 4 years ago
Hi, thanks for the issue.
Currently, it doesn't support multiple return values at all. That is, sincos
won't work either, where the return value is Tuple{Float64,Float64}
.
Supporting cis
and sincos
just in vmap
would require a less changes than with the @avx
macro.
On LoopVectorization 0.8.9, you can now do:
using StructArrays, LoopVectorization
function vcis!(y::StructArray{<:Complex}, x::AbstractVector)
re = y.re; im = y.im;
@avx for i in eachindex(re,im,x)
im[i], re[i] = sincos(x[i])
end
y
end
function vcis(x::AbstractArray)
T = float(eltype(x))
vcis!(StructArray{Complex{T}}((similar(x, T), similar(x, T))), x)
end
On my computer, I get:
julia> x = rand(1000);
julia> vcis(x) ≈ map(cis, x)
true
julia> @benchmark vcis($x)
BenchmarkTools.Trial:
memory estimate: 15.88 KiB
allocs estimate: 2
--------------
minimum time: 1.617 μs (0.00% GC)
median time: 1.812 μs (0.00% GC)
mean time: 2.231 μs (12.72% GC)
maximum time: 119.669 μs (94.97% GC)
--------------
samples: 10000
evals/sample: 10
julia> @benchmark map(cis, $x)
BenchmarkTools.Trial:
memory estimate: 15.75 KiB
allocs estimate: 1
--------------
minimum time: 8.442 μs (0.00% GC)
median time: 8.738 μs (0.00% GC)
mean time: 9.076 μs (2.44% GC)
maximum time: 307.998 μs (93.81% GC)
--------------
samples: 10000
evals/sample: 3
Thanks, this is great! I had a look through the changes and I wanted to check my understanding. Is it accurate to say that LoopVectorization now supports multiple return values? And is this a generic change, or just in the case of sincos
? Is it the same story for complex values? If I understand a bit better I'd be happy to contribute documentation
Thanks, this is great! I had a look through the changes and I wanted to check my understanding. Is it accurate to say that LoopVectorization now supports multiple return values? And is this a generic change, or just in the case of
sincos
?
Yes and yes, for example:
julia> using LoopVectorization
julia> foo(x) = (x,2x,x*x)
foo (generic function with 1 method)
julia> function sumfoo(y)
s = zero(eltype(y))
@avx for i ∈ eachindex(y)
a, b, c = foo(y[i])
s += (c + c) / (a * b) # 2x^2 / 2x^2 = 1
end
s
end
sumfoo (generic function with 1 method)
julia> y = rand(99);
julia> sumfoo(y)
99.0
There's still no support for complex values.
The recommended approach for handling them is still to use StructArrays.jl, and then load from/store into the constituent re
and im
arrays, like in the sincos
/vcis
example.
Contributions to the documentation, examples, etc would all be greatly appreciated, and I'd be more than happy to answer any questions. Others are likely to have those same questions, meaning it'd be great if the answers were documented.
Some vectorized math libraries (i. e. Intel VML) have a specialized
cis
routine, although I notice SLEEF doesn't.Is there any possible benefit to adding
cis
support? In the "Future Work" section of the docs, plans are mentioned to support heterogeneous types in the loop and non-primitive types better. Would that be a prerequisite for including a fastcis
implementation?I also noted that if I wrote out the loop, things would at least run, but using the convenience functions gave me an error:
throws