Open cpud36 opened 1 year ago
Do you have any ideas for benchmark-able workloads?
So far it seems like Vec
is a bit faster in linear iteration, but FuseBox
is a bit faster in random access. Which is kinda weird.
It seems your malloc is pretty good.
Essentially Header::offset
is not that different from a pointer.
To get to the data, one has to first index into the headers
vector and then index into inner
pointer/array.
And the array index operation consists of ptr offset and ptr deref.
So FuseBox
is still prone to double indirection problem.
In linear iteration bench, both FuseBox
and Box
es seem to have allocated the data continuously. FuseBox
uses custom arena allocator, so this is to be expected. For Box
es malloc just did a pretty good job in this case(for allocations of similar size some malloc impls use essentially a global arena).
The performance penalty FuseBox
took in simply due to the Header
being slightly larger than fat ptr.
A Box<dyn Trait>
stores a pointer to the data and a pointer to vtable, 16 bytes total. The Header
is 24 bytes total(an extra size field).
So for FuseBox
there is more cache misses simply due to headers
vec being larger.
For starter, you could remove Header::size
field, and instead store only the size of the last element. This should improve linear iteration performance.
P. S. I am puzzled by random access performance
It seems your malloc is pretty good.
Could try musl target, they provide absolutely the worst allocator performances out-of-the-box, which is also the default on Alpine.
Also, maybe it's a good idea to mix different trait objects?
For starter, you could remove
Header::size
field, and instead store only the size of the last element. This should improve linear iteration performance.
The benchmarks I did were done after I removed size
from Header
.
Here's a plot of performance in random access as it is now:
(inlinemeta fusebox is a version of fusebox { offsets: Vec<usize>, inner: *mut (T, Metadata) }
)
For completeness' sake here's linear access performance:
And musl just makes everyone perform slightly worse.
Musl random access:
Musl linear iteration:
Tried Vec<&'bumpalo mut dyn Trait>
.
Confusingly, it is even worse than naive Vec<Box>
in random access, but ever so slightly faster in linear access.
Random:
Linear:
It is unclear when this crate performs better than
Vec<Box<dyn Trait>>
approach. Having benchmarks would outline possible use cases.From what I can see, makes sense to compare the following approaches:
Vec<Box<dyn Trait>>
FuseBox
itselfVec<&dyn Trait>
withbumpalo
as an allocator(or any other slab/arena allocator).My hunch is that: