Open aminya opened 4 years ago
For some reason I had restricted vmaterialize!
to only accept T<: Union{Float32,Float64}
.
Locally (on master), I've now extended it to T <: Union{Int32,Int64,Float32,Float64}
.
I'll add tests with integers before pushing to master.
However, I now get a different error:
julia> @avx a[indices] .= items
ERROR: conversion to pointer not defined for SubArray{Int64,1,Array{Int64,1},Tuple{Array{Int64,1}},false}
Stacktrace:
[1] error(::String) at ./error.jl:33
[2] unsafe_convert(::Type{Ptr{Int64}}, ::SubArray{Int64,1,Array{Int64,1},Tuple{Array{Int64,1}},false}) at ./pointer.jl:67
[3] pointer(::SubArray{Int64,1,Array{Int64,1},Tuple{Array{Int64,1}},false}) at ./abstractarray.jl:933
[4] stridedpointer at /home/c285497/.julia/dev/VectorizationBase/src/vectorizable.jl:438 [inlined]
[5] macro expansion at /home/c285497/.julia/dev/LoopVectorization/src/broadcast.jl:194 [inlined]
[6] vmaterialize!(::SubArray{Int64,1,Array{Int64,1},Tuple{Array{Int64,1}},false}, ::Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{1},Nothing,typeof(identity),Tuple{Array{Float64,1}}}, ::Val{:Main}) at /home/c285497/.julia/dev/LoopVectorization/src/broadcast.jl:194
[7] top-level scope at REPL[5]:1
If indices were 1:20
, this should work (ie, conversion to pointer is defined for those subarrays).
However, collect(1:20)
is much more difficult, because the elements aren't necessarily going to hold any particular pattern.
It would be good to add support, so I'll leave this issue open. Support will be added in VectorizationBase and SIMDPirates.
stridedpointer
, which is a function we can overload for special behavior on different array types. It already has two different special overloads for SubArray
s, based on whether the first index is an Int
or a UnitRange
(SubArray
s whose first index is an Int
's elements will be discontiguous, so we treat them differently).
I'll have to add another method and type to handle subarrays indexed by arrays.indices
is a unitrange. Currently, loading from unitranges produces Vec
s, which forces your stores to be gathers. I'll make loading from them produce _MM
s, which "know" that the elements are sequential, so stores/loads can be fast.Note that the loop is likely to give you wrong answers (or do weird things) if indices
contains repeats of the same number.
Eg, if
indices = [1,2,3,1]
a[indices] = 10:13
What should a
equal if we're assigning 10:13 in parallel above?
how is it supposed to store into the first element twice in parallel (eg, with AVX2)?
Some architectures have "conflict detection" instructions that can be used to look for this.
You (or anyone else) can file an issue if you want some sort of special handling around that.
It may be best to just call it "undefined behavior". Doing it sequentially as in the above code of course yields, a[1] == 13
.
Now if I run the following I get an error:
That expression is equal to the following that works