Open psiha opened 2 months ago
Please reconsider the possible overuse of forceinlining. For example for unordered_flat_set insert() is fully inlined at each callsite (even with rather complex non-default hashers) except for the unchecked_emplace_with_rehash() part. Considering that unordered containers are complex objects that themselves can never reside in registers I'm failing to see how this forceinlining helps anything.
The fact that the containers are complex objects much larger than a register does not have anything to do with inlining their member functions (such as insert
). The container itself (i.e. *this
) will be passed/used by reference, it's the implementation code of insert
that gets inlined inside the caller's code. We forced inlining because it actually improved performance for some compilers that didn't inline automatically in some cases.
The only thing I see is the common case when the objects contained are trivial objects that can be passed and returned in registers (as is my use case) - then the generic Args&&... signature would unnecessarily cause them to be passed through memory (ignoring the lucky case you get SRA optimization to kick in) - for this I'd rather some kind of modernized boost::call_traits-like mechanism be used.
Usually, the compiler is smart enough to decide whether the arguments of an inlined function are treated as references or passed around as values, regardless of what the function signature says --this is one of the benefits of inlining. Consider this example: when no optimizations are applied, foo
and bar
are indeed different (the former accepts its arg by reference), but with -O3
all differences disappear.
On a related note - consider skipping double key construction from ...args - rather move the key (constructed with key_from at the beginning of emplace_impl) and peel of ...args used for key construction before forwarding them to unchecked_emplace*.
The key is not constructed twice from args ever, although an examination of emplace_impl
in isolation may suggest otherwise. The critical part is in the implementation of emplace
, the entry function that later calls emplace_impl
:
In the first case, an object x
is constructed from args that holds the key, and this key is moved to its final destination upon successful insertion. If emplace
is passed a key directly (second case), then we omit any kind of construction and pass the key along.
Also could you not extract the construction from unchecked_emplace_at() and unchecked_emplace_with_rehash() - make them only allocate the space/positon and simply do an inplace construction at one place at the end of emplace_impl()? (that way you wouldn't have to forward Args&& around at all)
This is in principle doable, but:
unchecked_emplace_at
and unchecked_emplace_with_rehash
call nosize_unchecked_emplace_at
, which only makes permanent changes to the data structure after construction is successful:
Please reconsider the possible overuse of forceinlining. For example for unordered_flat_set insert() is fully inlined at each callsite (even with rather complex non-default hashers) except for the unchecked_emplace_with_rehash() part. Considering that unordered containers are complex objects that themselves can never reside in registers I'm failing to see how this forceinlining helps anything. The only thing I see is the common case when the objects contained are trivial objects that can be passed and returned in registers (as is my use case) - then the generic Args&&... signature would unnecessarily cause them to be passed through memory (ignoring the lucky case you get SRA optimization to kick in) - for this I'd rather some kind of modernized boost::call_traits-like mechanism be used.
On a related note - consider skipping double key construction from ...args - rather move the key (constructed with key_from at the beginning of emplace_impl) and peel of ...args used for key construction before forwarding them to unchecked_emplace*.
Also could you not extract the construction from unchecked_emplace_at() and unchecked_emplace_with_rehash() - make them only allocate the space/positon and simply do an inplace construction at one place at the end of emplace_impl()? (that way you wouldn't have to forward Args&& around at all)