Stumbled across some major speedups while chipping away at #334. Iteration of large archetypes is now ~twice as fast, with no significant change elsewhere.
Before:
test iterate_100k ... bench: 130,153 ns/iter (+/- 12,301)
test iterate_cached_100_by_50 ... bench: 1,010 ns/iter (+/- 75)
test iterate_mut_100k ... bench: 127,847 ns/iter (+/- 6,311)
test iterate_mut_cached_100_by_50 ... bench: 251 ns/iter (+/- 58)
test iterate_mut_uncached_100_by_50 ... bench: 602 ns/iter (+/- 138)
test iterate_uncached_100_by_50 ... bench: 2,146 ns/iter (+/- 44)
After:
test iterate_100k ... bench: 51,880 ns/iter (+/- 2,813)
test iterate_cached_100_by_50 ... bench: 957 ns/iter (+/- 117)
test iterate_mut_100k ... bench: 65,218 ns/iter (+/- 5,915)
test iterate_mut_cached_100_by_50 ... bench: 220 ns/iter (+/- 7)
test iterate_mut_uncached_100_by_50 ... bench: 701 ns/iter (+/- 548)
test iterate_uncached_100_by_50 ... bench: 2,340 ns/iter (+/- 205)
Oddly this was much less impactful on my Ryzen 7800x3d desktop than my old i7-8550U laptop, though the outlining of next_archetype fixed iterate_mut being significantly slower than iterate there.
Stumbled across some major speedups while chipping away at #334. Iteration of large archetypes is now ~twice as fast, with no significant change elsewhere.
Before:
After: