Open Ralith opened 9 months ago
Work out why this machinery is 10x slower in the iterate_mut_100k benchmark than query_mut in master. Nothing's jumping out at me in the disassembly; input would be welcome.
This is pure speculation, but could this be cache effects of the increased working set due to the effect table?
This is pure speculation, but could this be cache effects of the increased working set due to the effect table?
Surprisingly, that slowdown is for in-place/effect-free queries, for which effects are ZSTs and the effect table should allocate no memory.
These benchmarks do very few things within the iteration, which sometimes allow LLVM to optimize the loop far better... or not !
This seems to be very fragile. Adding a continue
in the iteration loop or a black_box
with the query values can make or break it.
For example, on #341, adding black_box
around vel.0
in the benchmark code canceled the optimization.
A 10-fold difference is a lot to hand-wave, even so. Is that not sustained with black_box
involved? Maybe a more computationally intensive benchmark would be interesting?
This would be useful for https://github.com/Ralith/hecs/pull/366, which presently must make two passes and perform a lot of redundant lookups to add/remove components to/from entities satisfying a query.
Fixes #334.
TODO:
iterate_mut_100k
benchmark thanquery_mut
in master. Nothing's jumping out at me in the disassembly; input would be welcome.query_with_effect
?) to fix unsoundness