Slow in-place broadcasted assignment (.=) of mutable FieldVectors

stillyslalom commented 4 years ago

As noted on Discourse (https://discourse.julialang.org/t/julia-style-arrays-matrix-vs-structs/32424/10?u=stillyslalom), broadcasted assignment seems to be many times slower for mutable FieldVectors than for MVectors.

using StaticArrays

struct Point3D{T} <: FieldVector{3, T}
    x::T
    y::T
    z::T
end

mutable struct MPoint3D{T} <: FieldVector{3, T}
    x::T
    y::T
    z::T
end

function f1!(v)
    for i in eachindex(v)
        x, y, z = v[i]
        v[i] = Point3D(2x, x*y, x*z)
    end
end

function f2!(v)
    for i in eachindex(v)
        x, y, z = v[i]
        v[i] .= (2x, x*y, x*z)
    end
end

using BenchmarkTools
v1 = [@SVector(rand(3)) for i = 1:1000]
v2 = [@MVector(rand(3)) for i = 1:1000]
v3 = [Point3D(rand(3))  for i = 1:1000]
v4 = [MPoint3D(rand(3)) for i = 1:1000]

@btime f1!($v1) # 853.985 ns (0 allocations: 0 bytes)
@btime f2!($v2) # 1.775 μs (0 allocations: 0 bytes)

@btime f1!($v3) # 852.917 ns (0 allocations: 0 bytes)
@btime f2!($v4) # 60.091 μs (3000 allocations: 46.88 KiB)

KristofferC commented 4 years ago

How is it for a normal MVector?

stillyslalom commented 4 years ago

That's v2 in the benchmark above:

@btime f2!($v2) # 1.775 μs (0 allocations: 0 bytes)

KristofferC commented 4 years ago

Ah, sorry, missed that.

I think it would be interesting to also add this style of assignment to see how that compares with broadcasting for the different types, https://discourse.julialang.org/t/julia-style-arrays-matrix-vs-structs/32424/10?u=kristoffer.carlsson.

stillyslalom commented 4 years ago

Your field-assignment function is still faster than broadcasting over MVectors, though:

function f3!(v)
   for vv in v
       x, y, z = vv
       vv.x = 2x
       vv.y = x*y
       vv.z = x*z
   end
end

@btime f3!($v4) # 1.158 μs (0 allocations: 0 bytes)

mateuszbaran commented 4 years ago

As far as I can tell indices used in broadcasting over a FieldVector are not constant-propagated to setfield! somehow. We can fix that by passing the index in type of an argument using Val but it's quite ugly. Do you have a better idea?

c42f commented 4 years ago

mutable struct MPoint3D{T} <: FieldVector{3, T}
    x::T
    y::T
    z::T
end

It's true that this is a performance bug, but I'm also tempted to say "don't do that" :-) Even though we do allow mutable field vectors I personally can't think of a use for them in numerical work. In practice I find that the immutable versions encourage a more functional style in geometric computation which is much more naturally mathematical, far more memory efficient when stored in a large array, and faster.

@mateuszbaran thanks for investigating. Did you have a more minimal piece of code you were looking at in coming to the conclusion about setfield!?

mateuszbaran commented 4 years ago

@mateuszbaran thanks for investigating. Did you have a more minimal piece of code you were looking at in coming to the conclusion about setfield!?

I've inspected the code generated for the problematic case and compared it with this:

julia> using StaticArrays

julia> mutable struct MPoint3D{T} <: FieldVector{3, T}
           x::T
           y::T
           z::T
       end

julia> f(a, i) = a[i] = 100
f (generic function with 1 method)

julia> a = MPoint3D(1,2,3)
3-element MPoint3D{Int64} with indices SOneTo(3):
 1
 2
 3

julia> @code_native f(a, 2)
    .text
; ┌ @ REPL[3]:1 within `f'
    pushq   %r14
    pushq   %rbx
    subq    $56, %rsp
    vxorps  %xmm0, %xmm0, %xmm0
    vmovaps %xmm0, (%rsp)
    movq    $0, 16(%rsp)
    movq    %fs:0, %rbx
; │┌ @ FieldArray.jl:107 within `setindex!'
    movq    $2, (%rsp)
    movq    -15712(%rbx), %rax
    movq    %rax, 8(%rsp)
    movq    %rdi, %r14
    movq    %rsp, %rax
    movq    %rax, -15712(%rbx)
    movabsq $jl_box_int64, %rax
    movq    %rsi, %rdi
    callq   *%rax
    movq    %rax, 16(%rsp)
    movq    %r14, 32(%rsp)
    movq    %rax, 40(%rsp)
    movabsq $139887313344800, %rax  # imm = 0x7F3A0D9ED920
    movq    %rax, 48(%rsp)
    movabsq $jl_f_setfield, %rax
    leaq    32(%rsp), %rsi
    xorl    %edi, %edi
    movl    $3, %edx
    callq   *%rax
    movq    8(%rsp), %rax
    movq    %rax, -15712(%rbx)
; │└
    movl    $100, %eax
    addq    $56, %rsp
    popq    %rbx
    popq    %r14
    retq
    nopw    %cs:(%rax,%rax)
; └

Although:

julia> g(a) = f(a, 2)
g (generic function with 1 method)

julia> @code_native g(a)
    .text
; ┌ @ REPL[6]:1 within `g'
; │┌ @ REPL[3]:1 within `f'
; ││┌ @ REPL[6]:1 within `setindex!'
    movq    $100, 8(%rdi)
; │└└
    movl    $100, %eax
    retq
    nop
; └

so the constant propagation sometimes works and sometimes it doesn't.

kylebeggs commented 2 years ago

It's true that this is a performance bug, but I'm also tempted to say "don't do that" :-) Even though we do allow mutable field vectors I personally can't think of a use for them in numerical work. In practice I find that the immutable versions encourage a more functional style in geometric computation which is much more naturally mathematical, far more memory efficient when stored in a large array, and faster.

@c42f Solving PDEs with moving boundaries is a use case for me.

JuliaArrays / StaticArrays.jl

Slow in-place broadcasted assignment (.=) of mutable FieldVectors #699