Use `SmallVec` for better performance

jakoschiko commented 1 month ago

Currently IntMap stores it values in this data structure:

cache: Vec<Vec<(u64, V)>>

If I understand correctly the inner Vec is used for entries with a hash collision. I would assume that hash collisions are rare, so allocating a Vec for storing a single entry might be too expensive.

A simple solution to this problem could be SmallVec:

cache: Vec<SmallVec<[(u64, V); 1]>>

I refactored IntMap and ran the benchmarks included in this repo:

Vec<Vec<(u64, V)>>

test tests::u64_get_built_in                     ... bench:      75,667.25 ns/iter (+/- 990.66)
test tests::u64_get_indexmap                     ... bench:     100,443.82 ns/iter (+/- 558.04)
test tests::u64_get_intmap                       ... bench:      12,725.52 ns/iter (+/- 15.93)
test tests::u64_insert_built_in                  ... bench:     104,197.52 ns/iter (+/- 293.96)
test tests::u64_insert_built_in_without_capacity ... bench:     263,766.42 ns/iter (+/- 612.20)
test tests::u64_insert_indexmap                  ... bench:     119,844.19 ns/iter (+/- 478.59)
test tests::u64_insert_intmap                    ... bench:      48,210.07 ns/iter (+/- 1,035.84)
test tests::u64_insert_intmap_checked            ... bench:      43,977.34 ns/iter (+/- 344.81)
test tests::u64_insert_intmap_entry              ... bench:      56,815.19 ns/iter (+/- 151.72)
test tests::u64_insert_intmap_without_capacity   ... bench:     534,254.25 ns/iter (+/- 4,754.61)
test tests::u64_resize_intmap                    ... bench:      21,808.31 ns/iter (+/- 157.90)

Vec<SmallVec<[(u64, V); 1]>>

test tests::u64_get_built_in                     ... bench:      78,840.49 ns/iter (+/- 1,060.83)
test tests::u64_get_indexmap                     ... bench:     104,876.86 ns/iter (+/- 601.68)
test tests::u64_get_intmap                       ... bench:      13,770.94 ns/iter (+/- 31.64)
test tests::u64_insert_built_in                  ... bench:     109,268.14 ns/iter (+/- 405.67)
test tests::u64_insert_built_in_without_capacity ... bench:     262,595.65 ns/iter (+/- 2,398.98)
test tests::u64_insert_indexmap                  ... bench:     121,934.15 ns/iter (+/- 362.42)
test tests::u64_insert_intmap                    ... bench:      44,227.69 ns/iter (+/- 104.81)
test tests::u64_insert_intmap_checked            ... bench:      44,575.48 ns/iter (+/- 76.96)
test tests::u64_insert_intmap_entry              ... bench:      57,808.73 ns/iter (+/- 137.58)
test tests::u64_insert_intmap_without_capacity   ... bench:     535,978.00 ns/iter (+/- 3,435.93)
test tests::u64_resize_intmap                    ... bench:      21,471.73 ns/iter (+/- 152.39)

It doesn't seem to make a difference. But then I ran the benchmarks mentioned #46:

Vec<Vec<(u64, V)>>

test bench_hash_map ... bench:   1,134,233.35 ns/iter (+/- 57,827.94)
test bench_int_map  ... bench:   1,927,631.20 ns/iter (+/- 22,938.10)

Vec<SmallVec<[(u64, V); 1]>>

test bench_hash_map ... bench:   1,145,045.10 ns/iter (+/- 157,035.95)
test bench_int_map  ... bench:   1,019,778.05 ns/iter (+/- 11,999.06)

This looks good, isn't it? My changes are on this branch.

What do you think? Is it worth to investigate this further?

JesperAxelsson commented 1 month ago

Could definitely be worth looking into. There is an collisions function that can be used to track of collisions, load rate is factor here as well. There will almost always be some collisions though. One thing to keep in mind is that I found the benchmarks to sometimes change between runs so it is wise to run it a couple of times to get a baseline. Changing the element count makes the bench mark more stable but take more time. Preferably there is benchmarks for different element counts, but I'm too lazy ^^

jakoschiko commented 1 month ago

Could definitely be worth looking into.

I will do it. Maybe it's worth to add more benchmarks before that. How do you feel about introducing the dependency smallvec?

One thing to keep in mind is that I found the benchmarks to sometimes change between runs so it is wise to run it a couple of times to get a baseline. Changing the element count makes the bench mark more stable but take more time.

I ran it multiple times and it was consistent.

Preferably there is benchmarks for different element counts, but I'm too lazy ^^

Would you mind switching to another bench lib like criterion or divan (my preferred choice)? They support this out of the box. And the current benchmark requires a nightly compiler which is a little bit annoying.

JesperAxelsson commented 4 weeks ago

Looked a bit at both of the benchmark libs. Both don't seems very active so I suppose divan should be fine. I would be fine with adding a dependency in this case.

JesperAxelsson / rust-intmap

Use `SmallVec` for better performance #57