Type-stable fields vs dynamic properties

AntonOresten commented 2 weeks ago

From some preliminary tests I found that functions that accessed regular typed fields of dynamic structs compiled to more or less the same LLVM code as they would if they were regular structs:

julia> struct A
           x::Int
       end

julia> @dynamic struct B
           x::Int
       end

julia> a, b = A(1), B(1)
(A(1), B(1))

julia> f(arg) = arg.x^2
f (generic function with 1 method)

julia> @code_llvm f(a)
;  @ REPL[6]:1 within `f`
; Function Attrs: uwtable
define i64 @julia_f_371([1 x i64]* nocapture noundef nonnull readonly align 8 dereferenceable(8) %0) #0 {
top:
; ┌ @ Base.jl:37 within `getproperty`
   %1 = getelementptr inbounds [1 x i64], [1 x i64]* %0, i64 0, i64 0
; └
; ┌ @ intfuncs.jl:332 within `literal_pow`
; │┌ @ int.jl:88 within `*`
    %unbox = load i64, i64* %1, align 8
    %2 = mul i64 %unbox, %unbox
    ret i64 %2
; └└
}

julia> @code_llvm f(b)
;  @ REPL[6]:1 within `f`
; Function Attrs: uwtable
define i64 @julia_f_384({ i64, {}* }* nocapture noundef nonnull readonly align 8 dereferenceable(16) %0) #0 {
top:
; ┌ @ C:\Users\anton\.julia\packages\DynamicStructs\UIdae\src\dynamic.jl:124 within `getproperty`
   %1 = getelementptr inbounds { i64, {}* }, { i64, {}* }* %0, i64 0, i32 0
; └
; ┌ @ intfuncs.jl:332 within `literal_pow`
; │┌ @ int.jl:88 within `*`
    %unbox = load i64, i64* %1, align 8
    %2 = mul i64 %unbox, %unbox
    ret i64 %2
; └└
}

For dynamic properties however, Julia needs to access a dictionary, where the value type is always unknown, and do a lot more gymnastics to execute the function:

julia> struct C
           x::Int
           y::Int
       end

julia> @dynamic struct D
           x::Int
       end

julia> c, d = C(1, 2), D(1)
(C(1, 2), D(1))

julia> d.y = 2
2

julia> g(arg) = arg.y^2
g (generic function with 1 method)

julia> @code_llvm g(c)
;  @ REPL[17]:1 within `g`
; Function Attrs: uwtable
define i64 @julia_g_428([2 x i64]* nocapture noundef nonnull readonly align 8 dereferenceable(16) %0) #0 {
top:
; ┌ @ Base.jl:37 within `getproperty`
   %1 = getelementptr inbounds [2 x i64], [2 x i64]* %0, i64 0, i64 1
; └
; ┌ @ intfuncs.jl:332 within `literal_pow`
; │┌ @ int.jl:88 within `*`
    %unbox = load i64, i64* %1, align 8
    %2 = mul i64 %unbox, %unbox
    ret i64 %2
; └└
}

julia> @code_llvm g(d)
;  @ REPL[17]:1 within `g`
; Function Attrs: uwtable
define nonnull {}* @julia_g_430({ i64, {}* }* nocapture noundef nonnull readonly align 8 dereferenceable(16) %0) #0 {
top:
  %1 = alloca [3 x {}*], align 8
  %gcframe2 = alloca [3 x {}*], align 16
  %gcframe2.sub = getelementptr inbounds [3 x {}*], [3 x {}*]* %gcframe2, i64 0, i64 0
  %.sub = getelementptr inbounds [3 x {}*], [3 x {}*]* %1, i64 0, i64 0
  %2 = bitcast [3 x {}*]* %gcframe2 to i8*
  call void @llvm.memset.p0i8.i64(i8* align 16 %2, i8 0, i64 24, i1 true)
  %3 = call {}*** inttoptr (i64 140717574380224 to {}*** ()*)() #6
  %4 = bitcast [3 x {}*]* %gcframe2 to i64*
  store i64 4, i64* %4, align 16
  %5 = getelementptr inbounds [3 x {}*], [3 x {}*]* %gcframe2, i64 0, i64 1
  %6 = bitcast {}** %5 to {}***
  %7 = load {}**, {}*** %3, align 8
  store {}** %7, {}*** %6, align 8
  %8 = bitcast {}*** %3 to {}***
  store {}** %gcframe2.sub, {}*** %8, align 8
  %9 = call nonnull {}* @j_getproperty_432({ i64, {}* }* nocapture nonnull readonly %0, {}* inttoptr (i64 2643369227288 to {}*))
  %10 = getelementptr inbounds [3 x {}*], [3 x {}*]* %gcframe2, i64 0, i64 2
  store {}* %9, {}** %10, align 16
  store {}* inttoptr (i64 140716902583312 to {}*), {}** %.sub, align 8
  %11 = getelementptr inbounds [3 x {}*], [3 x {}*]* %1, i64 0, i64 1
  store {}* %9, {}** %11, align 8
  %12 = getelementptr inbounds [3 x {}*], [3 x {}*]* %1, i64 0, i64 2
  store {}* inttoptr (i64 140716903315504 to {}*), {}** %12, align 8
  %13 = call nonnull {}* @ijl_apply_generic({}* inttoptr (i64 140716917914160 to {}*), {}** nonnull %.sub, i32 3)
  %14 = load {}*, {}** %5, align 8
  %15 = bitcast {}*** %3 to {}**
  store {}* %14, {}** %15, align 8
  ret {}* %13
}

This overhead is arguably warranted since dynamic properties offer complete freedom.

My thoughts are: Any task that could be bottlenecked by type-instability from accessing dynamic properties should not be using dynamic properties.

This should still be tested and benchmarked more robustly, as well as automated in the CI.

Tortar commented 2 weeks ago

I think that anyway a good benchmark between static and dynamic properties could be:

julia> using DynamicStructs

julia> @dynamic mutable struct A x::Int end

julia> v = [A(i, y=i) for i in 1:1000];

julia> using BenchmarkTools

julia> @benchmark sum(a.x for a in $v)
BenchmarkTools.Trial: 10000 samples with 181 evaluations.
 Range (min … max):  585.221 ns … 697.099 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     593.862 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   596.373 ns ±   6.762 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

           ▁▁   ▂▇█▇▄                     ▁▃▄▃        ▁      ▂▂ ▂
  ▄▅▇▅▃▁▁▆███▆▅▄█████▆▄▅▃▄▃▄▄▄▅▄▄▆▄▄▄▆▆▄▄▇█████▆▆▆▆▄▃███▅▅▆▃███ █
  585 ns        Histogram: log(frequency) by time        617 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark sum(a.y for a in $v)
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):  16.832 μs …  1.579 ms  ┊ GC (min … max): 0.00% … 96.63%
 Time  (median):     18.025 μs              ┊ GC (median):    0.00%
 Time  (mean ± σ):   18.872 μs ± 24.117 μs  ┊ GC (mean ± σ):  2.12% ±  1.66%

    ▄▄▅▇██▆▅▄▂               ▁                    ▁▁▁▁        ▂
  ▆▇███████████▆▃▃▁▄▃▃▅▆▃▆▇▇███▇▅▆▆▄▅▄▇▇▇▆▆▆▆▅▄▅▇███████▆▆▅▇▇ █
  16.8 μs      Histogram: log(frequency) by time      26.9 μs <

 Memory estimate: 15.14 KiB, allocs estimate: 969.

Do you have any idea to improve the dynamic benchmark? To me it seems maybe possible if we assume that y will always be a Int for all instances. We could enforce it by keeping track on the type of each new added property. At the same time, this restricts a bit the functioning of the library...and I don't still know how to say to the compiler that this can be assumed, but it seems to work:

julia> @benchmark sum(a.y::Int for a in $v)
BenchmarkTools.Trial: 10000 samples with 9 evaluations.
 Range (min … max):  2.734 μs …  5.116 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     2.805 μs              ┊ GC (median):    0.00%
 Time  (mean ± σ):   2.817 μs ± 78.998 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

    ▃▃▃▃▃▃▄▇██▆▃▂▁       ▁ ▁▁▁                           ▁▁  ▂
  ▄████████████████▅▆▆▄▅▇█████▇▇▇▅▄▄▃▃▁▄▃▁▃▁▃▄▄▁▃▅▄▆▄▆▇▆████ █
  2.73 μs      Histogram: log(frequency) by time     3.12 μs <

 Memory estimate: 0 bytes, allocs estimate: 0.

Tortar commented 2 weeks ago

We could probably have two different version if we manage to exploit the constrained one for performance, still unsure how though

AntonOresten commented 2 weeks ago

julia> @benchmark sum(a.y::Int for a in $v)
BenchmarkTools.Trial: 10000 samples with 9 evaluations.
 Range (min … max):  2.833 μs …  40.378 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     3.711 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   3.944 μs ± 978.648 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

          ▂█▁
  ▁▁▁▁▁▃▂▃███▄▂▂▃▂▃▅▆▄▂▂▂▂▃▄▃▂▃▄▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁ ▂
  2.83 μs         Histogram: frequency by time        6.13 μs <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark for a in $v
           a.y::Int
       end
BenchmarkTools.Trial: 10000 samples with 9 evaluations.
 Range (min … max):  2.533 μs … 50.122 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     3.622 μs              ┊ GC (median):    0.00%
 Time  (mean ± σ):   3.884 μs ±  1.256 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

  █
  █▁▁▁▁▁▁▃▂▃▄▃▆▃▄▄▃▂▂▂▁▂▂▂▁▂▂▂▃▃▄▄▃▄▄▃▃▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▂
  2.53 μs        Histogram: frequency by time        6.57 μs <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark for a in $v
           a.y
       end
BenchmarkTools.Trial: 10000 samples with 9 evaluations.
 Range (min … max):  2.544 μs …  5.556 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     2.567 μs              ┊ GC (median):    0.00%
 Time  (mean ± σ):   2.592 μs ± 84.690 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▃█ ▇ ▃ ▂ ▂▁ ▁ ▂ ▃ ▃▂ ▂ ▂ ▁ ▂▃ ▂ ▁     ▁                    ▂
  ██▁█▁█▁█▁██▁█▁█▁█▁██▁█▁█▁█▁██▁█▁█▁█▁█▁█▇▁▇▁▇▁▄▁▆▇▁▆▁▆▁▅▁▆▅ █
  2.54 μs      Histogram: log(frequency) by time      2.9 μs <

 Memory estimate: 0 bytes, allocs estimate: 0.

We're bottlenecked by accessing the properties through the dictionary. The type assertion helps a lot in the sum example when we need to dispatch on the type, but we can't escape the inefficiencies of dictionary look-ups, regardless of implementation, especially with Any values.

It might be possible to use generated functions to route getproperty to some special function that dispatches on Val{property_name} with a consistent return type. I'm not sure though, and it might restrict type freedom.

AntonOresten / DynamicStructs.jl

Type-stable fields vs dynamic properties #8