JuliaSIMD / LoopVectorization.jl

Macro(s) for vectorizing loops.
MIT License
742 stars 68 forks source link

Assignment vectorization #43

Open aminya opened 4 years ago

aminya commented 4 years ago

Now if I run the following I get an error:

a = collect(1:100)
items = rand(20)
indices = collect(1:20)
@avx a[indices] .= items
ERROR: MethodError: no method matching vmaterialize!(::SubArray{Int64,1,Array{Int64,1},Tuple{Array{Int64,1}},false}, ::Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{1},Nothing,typeof(identity),Tuple{Array{Int64,1}}}, ::Val{:Main})Closest candidates are:
  vmaterialize!(::Union{LinearAlgebra.Adjoint{T,A}, LinearAlgebra.Transpose{T,A}}, ::BC, ::Val{Mod}) where {T<:Union{Float32, Float64}, N, A<:AbstractArray{T,N}, BC<:Base.Broadcast.Broadcasted, Mod} at C:\Users\yahyaaba\.julia\packages\LoopVectorization\iNfCA\src\broadcast.jl:218
  vmaterialize!(::AbstractArray{T,N}, ::BC, ::Val{Mod}) where {T<:Union{Float32, Float64}, N, BC<:Base.Broadcast.Broadcasted, Mod} at C:\Users\yahyaaba\.julia\packages\LoopVectorization\iNfCA\src\broadcast.jl:194

That expression is equal to the following that works

indiceslen = length(indices)
@avx for k = 1:indiceslen
        a[indices[k]] = items[k]
    end
chriselrod commented 4 years ago

For some reason I had restricted vmaterialize! to only accept T<: Union{Float32,Float64}. Locally (on master), I've now extended it to T <: Union{Int32,Int64,Float32,Float64}.

I'll add tests with integers before pushing to master.

However, I now get a different error:

julia> @avx a[indices] .= items
ERROR: conversion to pointer not defined for SubArray{Int64,1,Array{Int64,1},Tuple{Array{Int64,1}},false}
Stacktrace:
 [1] error(::String) at ./error.jl:33
 [2] unsafe_convert(::Type{Ptr{Int64}}, ::SubArray{Int64,1,Array{Int64,1},Tuple{Array{Int64,1}},false}) at ./pointer.jl:67
 [3] pointer(::SubArray{Int64,1,Array{Int64,1},Tuple{Array{Int64,1}},false}) at ./abstractarray.jl:933
 [4] stridedpointer at /home/c285497/.julia/dev/VectorizationBase/src/vectorizable.jl:438 [inlined]
 [5] macro expansion at /home/c285497/.julia/dev/LoopVectorization/src/broadcast.jl:194 [inlined]
 [6] vmaterialize!(::SubArray{Int64,1,Array{Int64,1},Tuple{Array{Int64,1}},false}, ::Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{1},Nothing,typeof(identity),Tuple{Array{Float64,1}}}, ::Val{:Main}) at /home/c285497/.julia/dev/LoopVectorization/src/broadcast.jl:194
 [7] top-level scope at REPL[5]:1

If indices were 1:20, this should work (ie, conversion to pointer is defined for those subarrays). However, collect(1:20) is much more difficult, because the elements aren't necessarily going to hold any particular pattern.

It would be good to add support, so I'll leave this issue open. Support will be added in VectorizationBase and SIMDPirates.

Note that the loop is likely to give you wrong answers (or do weird things) if indices contains repeats of the same number. Eg, if

indices = [1,2,3,1]
a[indices] = 10:13

What should a equal if we're assigning 10:13 in parallel above? how is it supposed to store into the first element twice in parallel (eg, with AVX2)? Some architectures have "conflict detection" instructions that can be used to look for this. You (or anyone else) can file an issue if you want some sort of special handling around that. It may be best to just call it "undefined behavior". Doing it sequentially as in the above code of course yields, a[1] == 13.