JuliaLang / julia

The Julia Programming Language
https://julialang.org/
MIT License
45.64k stars 5.48k forks source link

Confusing allocation of boxed Int in small union of `Union{UInt, String}` #44611

Open NHDaly opened 2 years ago

NHDaly commented 2 years ago

I've boiled down this surprising behavior into an MRE as follows.

I'm surprised that this function is boxing and allocating the UInt for this small union. A simpler function, which I'll show later in this post, doesn't allocate as expected. Similarly, a non-union struct doesn't allocate. So i'm not sure what it is about the logic in this function that forces the Int to get boxed?

I think this seems like an area for improvement, which is why i'm posting this as an issue. Thanks! :)

The MRE:

julia> VERSION
v"1.8.0-DEV.1491"

julia> using BenchmarkTools

julia> struct UnionUIntOrString
           v::Union{UInt64,String}
       end

julia> function make_str(x)
           # For small strings, stick the data into an inline UInt (the goal is to avoid
           #     allocations, but it surprisingly still allocates).
           if sizeof(x) < 8
               v = UInt64(0)
               for (i,c) in enumerate(codeunits(x))
                   v |= (UInt64(c) << (i*sizeof(c)*8))
               end
               return UnionUIntOrString(v)
           else
               return UnionUIntOrString(string(x))
           end
       end
make_str (generic function with 1 method)

julia> @btime make_str("hi")  # I would've expected no allocation.
  9.608 ns (1 allocation: 16 bytes)
UnionUIntOrString(0x0000000000696800)

julia> @btime make_str("this is a really long string, and it gets passed-in directly, so no allocs i guess")
  3.378 ns (0 allocations: 0 bytes)
UnionUIntOrString("this is a really long string, and it gets passed-in directly, so no allocs i guess")

julia> "In the next two examples, we use a @view to require allocating a new string in the else-case";

julia> @btime make_str(@view("hi"[1:2])) 
  19.044 ns (1 allocation: 16 bytes)
UnionUIntOrString(0x0000000000696800)

julia> @btime make_str(@view("but like, what is going on my dude"[1:20]))
  22.114 ns (1 allocation: 40 bytes)
UnionUIntOrString("but like, what is go")

@code_llvm says that it's boxing the UInt for some reason:

L137:                                             ; preds = %L118, %middle.block, %L48, %L5
  %value_phi25 = phi i64 [ 0, %L5 ], [ %9, %L48 ], [ %27, %middle.block ], [ %44, %L118 ]
;  @ REPL[20]:7 within `make_str`
; ┌ @ REPL[8]:2 within `UnionUIntOrString`
   %45 = call nonnull {}* @ijl_box_uint64(i64 zeroext %value_phi25)
   br label %common.ret
; └
}

Full LLVM IR here: https://gist.github.com/NHDaly/84b685d302dd77e67dbbe56cf3f2fe06

...

As you can see, with a simpler function, there is no allocation for the UInt case (although weirdly now there's two allocs for the string case? 🤔):

julia> function foo2(x)
           if x < 0
               return UnionUIntOrString(UInt64(x+1000)*50)
           else
               return UnionUIntOrString(string(x))
           end
       end
foo2 (generic function with 3 methods)

julia> @btime foo2(-1)
  0.044 ns (0 allocations: 0 bytes)
UnionUIntOrString(0x000000000000c31e)

julia> @btime foo2(+1)
  30.771 ns (2 allocations: 88 bytes)
UnionUIntOrString("1")

Finally, when there's no Union, we don't need to box the UInt for the returned struct:

julia> struct DefinitelyAnInt
           v::UInt64
       end

julia> function make_str_uint(x)
           @assert sizeof(x) < 8
               v = UInt64(0)
               for (i,c) in enumerate(codeunits(x))
                   v |= (UInt64(c) << (i*sizeof(c)*8))
               end
               return DefinitelyAnInt(v)
       end
make_str_uint (generic function with 1 method)

julia> @btime make_str_uint(@view("hi"[1:2]))
  12.933 ns (0 allocations: 0 bytes)
DefinitelyAnInt(0x0000000000696800)

Can anyone help me understand what's happening here, and whether this could be improved?

My understanding was that we essentially need to store 8 bytes either way: for the UInt or for the String's pointer, and so my expectation was that we wouldn't need to allocate the UInt, and the compiler would essentially include a boolean alongside the value in the struct's definition which acts like a tag and indicates how to interpret the value.

Is there a reason that isn't working in this case? Thank you!

NHDaly commented 2 years ago

For more exposition - this is more or less the code that I expected the original example to generate (except that I would have expected the type tag to be multiplexed into the String's pointer, so the total size would be 16 bytes not 24). As you can see, it does not allocate when constructing the returned struct:

julia> struct TaggedUIntOrString
           i::UInt64
           s::String
           isint::Bool  # true if `i`, false if `s`
           TaggedUIntOrString(i::Number) = new(i, "", true)
           TaggedUIntOrString(s::AbstractString) = new(0, s, false)
       end

julia> TaggedUIntOrString(5)
TaggedUIntOrString(0x0000000000000005, "", true)

julia> TaggedUIntOrString("Hello")
TaggedUIntOrString(0x0000000000000000, "Hello", false)

julia> function make_str(x)
           # For small strings, stick the data into an inline UInt (the goal is to avoid
           #     allocations, but it surprisingly still allocates).
           if sizeof(x) < 8
               v = UInt64(0)
               for (i,c) in enumerate(codeunits(x))
                   v |= (UInt64(c) << (i*sizeof(c)*8))
               end
               return TaggedUIntOrString(v)
           else
               return TaggedUIntOrString(string(x))
           end
       end
make_str (generic function with 1 method)

julia> using BenchmarkTools

julia> @btime make_str("hi")  # Hooray, no allocation!
  10.307 ns (0 allocations: 0 bytes)
TaggedUIntOrString(0x0000000000696800, "", true)

julia> @btime make_str("this is a really long string, and it gets passed-in directly, so no allocs i guess")
  10.959 ns (0 allocations: 0 bytes)
TaggedUIntOrString(0x0000000000000000, "this is a really long string, and it gets passed-in directly, so no allocs i guess", false)

julia> @btime make_str(@view("hi"[1:2]))
  28.511 ns (0 allocations: 0 bytes)
TaggedUIntOrString(0x0000000000696800, "", true)

julia> @btime make_str(@view("but like, what is going on my dude"[1:20]))
  36.557 ns (1 allocation: 48 bytes)
TaggedUIntOrString(0x0000000000000000, "but like, what is go", false)

Also, weirdly, it seems that the original struct definition has a reported size of 8 bytes, which I think is consistent with the fact that julia is planning to box the UInt or String:

julia> struct UnionUIntOrString
           v::Union{UInt64,String}
       end

julia> sizeof(UnionUIntOrString)
8

I expected it to be more similar to this Union{UInt64,Nothing}, which is 16 bytes because it needs to keep a separate boolean tag bit because every bit of the UInt64 is used so there's no where to multiplex (whereas a Union{String,Nothing} can be only 8 bytes since it can represent the nothing via a sentinal value in the string pointer):

julia> struct UnionUIntOrNothing
           v::Union{UInt64,Nothing}
       end

julia> sizeof(UnionUIntOrNothing)
16

julia> struct UnionStringOrNothing
           v::Union{String,Nothing}
       end

julia> sizeof(UnionStringOrNothing)
8