Do you really need to use this TypeIdMap hack?

AngelicosPhosphoros commented 3 years ago

I am talking about you custom noop hasher here.

It does rely on TypeId being some specific hash internally and rely on its internal representation but its docs doesn't make any guarantee about it:

Each TypeId is an opaque object which does not allow inspection of what's inside but does allow basic operations such as cloning, comparison, printing, and showing.

I wrote microbenchmark comparing your TypeId implementation, fxhash implementation and std default hasher:

use std::any::TypeId;
use std::collections::HashMap;
use std::hash::{BuildHasherDefault, Hasher};

use criterion::{criterion_group, criterion_main, Criterion};
use fxhash::FxHashMap;

/// A hasher optimized for hashing a single TypeId.
///
/// TypeId is already thoroughly hashed, so there's no reason to hash it again.
/// Just leave the bits unchanged.
#[derive(Default)]
pub(crate) struct TypeIdHasher {
    hash: u64,
}

impl Hasher for TypeIdHasher {
    fn write_u64(&mut self, n: u64) {
        // Only a single value can be hashed, so the old hash should be zero.
        debug_assert_eq!(self.hash, 0);
        self.hash = n;
    }

    // Tolerate TypeId being either u64 or u128.
    fn write_u128(&mut self, n: u128) {
        debug_assert_eq!(self.hash, 0);
        self.hash = n as u64;
    }

    fn write(&mut self, _bytes: &[u8]) {
        unreachable!()
    }

    fn finish(&self) -> u64 {
        self.hash
    }
}

/// A HashMap with TypeId keys
///
/// Because TypeId is already a fully-hashed u64 (including data in the high seven bits,
/// which hashbrown needs), there is no need to hash it again. Instead, this uses the much
/// faster no-op hash.
pub(crate) type TypeIdMap<V> = HashMap<TypeId, V, BuildHasherDefault<TypeIdHasher>>;

pub fn criterion_benchmark(c: &mut Criterion) {
    let type_ids = [
        TypeId::of::<i32>(),
        TypeId::of::<u32>(),
        TypeId::of::<String>(),
        TypeId::of::<usize>(),
    ];

    let type_id_map: TypeIdMap<TypeId> = type_ids[..2].iter().map(|&x| (x, x)).collect();
    let fx_map: FxHashMap<TypeId, TypeId> = type_ids[..2].iter().map(|&x| (x, x)).collect();
    let std_map: HashMap<TypeId, TypeId> = type_ids[..2].iter().map(|&x| (x, x)).collect();

    c.bench_function("TypeIdMap", |b| {
        b.iter(|| -> [Option<TypeId>; 4] {
            [
                type_id_map.get(&type_ids[0]).cloned(),
                type_id_map.get(&type_ids[1]).cloned(),
                type_id_map.get(&type_ids[2]).cloned(),
                type_id_map.get(&type_ids[3]).cloned(),
            ]
        })
    });

    c.bench_function("FxHashMap", |b| {
        b.iter(|| -> [Option<TypeId>; 4] {
            [
                fx_map.get(&type_ids[0]).cloned(),
                fx_map.get(&type_ids[1]).cloned(),
                fx_map.get(&type_ids[2]).cloned(),
                fx_map.get(&type_ids[3]).cloned(),
            ]
        })
    });

    c.bench_function("StdMap", |b| {
        b.iter(|| -> [Option<TypeId>; 4] {
            [
                std_map.get(&type_ids[0]).cloned(),
                std_map.get(&type_ids[1]).cloned(),
                std_map.get(&type_ids[2]).cloned(),
                std_map.get(&type_ids[3]).cloned(),
            ]
        })
    });
}

criterion_group!(benches, criterion_benchmark);
criterion_main!(benches);

I got such results:

     Running target\release\deps\bench_type_id-d50f4b959d4e78a1.exe
Gnuplot not found, using plotters backend
TypeIdMap               time:   [6.3074 ns 6.3188 ns 6.3315 ns]
Found 5 outliers among 100 measurements (5.00%)
  2 (2.00%) high mild
  3 (3.00%) high severe

FxHashMap               time:   [6.5530 ns 6.5636 ns 6.5726 ns]

StdMap                  time:   [60.491 ns 60.561 ns 60.643 ns]
Found 4 outliers among 100 measurements (4.00%)
  1 (1.00%) low mild
  2 (2.00%) high mild
  1 (1.00%) high severe

I don't really think that 0.2 ns are really worth to have the hack in your code. Maybe it is better to just use FxHash? It is well-tested: it is used by rustc internally. Also this change would reduce amount of code to maintain.

DJMcNab commented 3 years ago

I would say that your benchmark probably has extraordinarily good cache locality - are the results any different with say 100 type IDs

Also you probably need a few black_boxes to make sure the hashing isn't being optimised out.

I have also written this exact same code before, but it is probably better just to use a normal fast hash.

AngelicosPhosphoros commented 3 years ago

@DJMcNab Ok, I thrown more types (and changed to use sets, because they are easire), conclusion is same.

Code:

use std::any::TypeId;
use std::collections::HashSet;
use std::hash::{BuildHasherDefault, Hasher};

use criterion::{black_box, criterion_group, criterion_main, Criterion};
use fxhash::FxHashSet;

struct S0;
struct S1;
struct S2;
struct S3;
struct S4;
struct S5;
struct S6;
struct S7;
struct S8;
struct S9;
struct S10;
struct S11;
struct S12;
struct S13;
struct S14;
struct S15;
struct S16;
struct S17;
struct S18;
struct S19;
struct S20;
struct S21;
struct S22;
struct S23;
struct S24;
struct S25;
struct S26;
struct S27;
struct S28;
struct S29;
struct S30;
struct S31;
struct S32;
struct S33;
struct S34;
struct S35;
struct S36;
struct S37;
struct S38;
struct S39;
struct S40;
struct S41;
struct S42;
struct S43;
struct S44;
struct S45;
struct S46;
struct S47;
struct S48;
struct S49;
struct S50;
struct S51;
struct S52;
struct S53;
struct S54;
struct S55;
struct S56;
struct S57;
struct S58;
struct S59;
struct S60;
struct S61;
struct S62;
struct S63;
struct S64;
struct S65;
struct S66;
struct S67;
struct S68;
struct S69;
struct S70;
struct S71;
struct S72;
struct S73;
struct S74;
struct S75;
struct S76;
struct S77;
struct S78;
struct S79;
struct S80;
struct S81;
struct S82;
struct S83;
struct S84;
struct S85;
struct S86;
struct S87;
struct S88;
struct S89;
struct S90;
struct S91;
struct S92;
struct S93;
struct S94;
struct S95;
struct S96;
struct S97;
struct S98;
struct S99;

/// A hasher optimized for hashing a single TypeId.
///
/// TypeId is already thoroughly hashed, so there's no reason to hash it again.
/// Just leave the bits unchanged.
#[derive(Default)]
pub(crate) struct TypeIdHasher {
    hash: u64,
}

impl Hasher for TypeIdHasher {
    fn write_u64(&mut self, n: u64) {
        // Only a single value can be hashed, so the old hash should be zero.
        debug_assert_eq!(self.hash, 0);
        self.hash = n;
    }

    // Tolerate TypeId being either u64 or u128.
    fn write_u128(&mut self, n: u128) {
        debug_assert_eq!(self.hash, 0);
        self.hash = n as u64;
    }

    fn write(&mut self, _bytes: &[u8]) {
        unreachable!()
    }

    fn finish(&self) -> u64 {
        self.hash
    }
}

/// A HashMap with TypeId keys
///
/// Because TypeId is already a fully-hashed u64 (including data in the high seven bits,
/// which hashbrown needs), there is no need to hash it again. Instead, this uses the much
/// faster no-op hash.
pub(crate) type TypeIdSet = HashSet<TypeId, BuildHasherDefault<TypeIdHasher>>;

macro_rules! make_array_of_type_ids {
    ($($t:ty),+) => {
        [$(::std::any::TypeId::of::<$t>(),)+]
    };
}

pub fn criterion_benchmark(c: &mut Criterion) {
    let type_ids = make_array_of_type_ids!(
        S0, S1, S2, S3, S4, S5, S6, S7, S8, S9, S10, S11, S12, S13, S14, S15, S16, S17, S18, S19,
        S20, S21, S22, S23, S24, S25, S26, S27, S28, S29, S30, S31, S32, S33, S34, S35, S36, S37,
        S38, S39, S40, S41, S42, S43, S44, S45, S46, S47, S48, S49, S50, S51, S52, S53, S54, S55,
        S56, S57, S58, S59, S60, S61, S62, S63, S64, S65, S66, S67, S68, S69, S70, S71, S72, S73,
        S74, S75, S76, S77, S78, S79, S80, S81, S82, S83, S84, S85, S86, S87, S88, S89, S90, S91,
        S92, S93, S94, S95, S96, S97, S98, S99
    );
    let type_ids = black_box(type_ids);

    let type_id_set: TypeIdSet = type_ids.iter().copied().step_by(2).collect();
    let fx_set: FxHashSet<TypeId> = type_ids.iter().copied().step_by(2).collect();
    let std_set: HashSet<TypeId> = type_ids.iter().copied().step_by(2).collect();

    let type_id_set = black_box(type_id_set);
    let fx_set = black_box(fx_set);
    let std_set = black_box(std_set);
    let type_ids = black_box(type_ids);

    let generate = || {
        (
            // Evenly distibute exists and not exists cases
            black_box(type_ids[..type_ids.len()/2].iter().copied().collect::<Vec<_>>()),
            Vec::<bool>::with_capacity(type_ids.len()),
        )
    };
    c.bench_function("TypeIdSet", |b| {
        b.iter_batched(
            generate,
            |(indices, mut output): (Vec<TypeId>, Vec<bool>)| -> (Vec<TypeId>, Vec<bool>) {
                let set = &type_id_set;
                output.extend(indices.iter().map(|x| set.contains(x)));
                black_box((indices, output))
            },
            criterion::BatchSize::LargeInput,
        )
    });

    c.bench_function("FxHashSet", |b| {
        b.iter_batched(
            generate,
            |(indices, mut output): (Vec<TypeId>, Vec<bool>)| -> (Vec<TypeId>, Vec<bool>) {
                let set = &fx_set;
                output.extend(indices.iter().map(|x| set.contains(x)));
                black_box((indices, output))
            },
            criterion::BatchSize::LargeInput,
        )
    });

    c.bench_function("StdSet", |b| {
        b.iter_batched(
            generate,
            |(indices, mut output): (Vec<TypeId>, Vec<bool>)| -> (Vec<TypeId>, Vec<bool>) {
                let set = &std_set;
                output.extend(indices.iter().map(|x| set.contains(x)));
                black_box((indices, output))
            },
            criterion::BatchSize::LargeInput,
        )
    });
}

criterion_group!(benches, criterion_benchmark);
criterion_main!(benches);

Benchmark output:

TypeIdSet               time:   [237.24 ns 238.47 ns 239.82 ns]
Found 3 outliers among 100 measurements (3.00%)
  2 (2.00%) high mild
  1 (1.00%) high severe

FxHashSet               time:   [225.61 ns 226.22 ns 226.86 ns]
Found 6 outliers among 100 measurements (6.00%)
  2 (2.00%) low mild
  2 (2.00%) high mild
  2 (2.00%) high severe

StdSet                  time:   [867.64 ns 868.28 ns 869.02 ns]
Found 7 outliers among 100 measurements (7.00%)
  2 (2.00%) high mild
  5 (5.00%) high severe

Surprisingly, in this case fxhash performed even better.

Also you probably need a few black_boxes to make sure the hashing isn't being optimised out.

I think, one of the pros of both current custom hasher and fxhash that they can be constant folded in constrast of RandomState hasher. Anyway, I added them.

DJMcNab commented 3 years ago

It is surprising that fxhasher is faster

But yeah, thanks for checking that. I would say at this point make a PR to use fxhash so we can see what our BDFL thinks

You can also sanity check it using the bevy_bench example thing.

AngelicosPhosphoros commented 3 years ago

Made PR

bevyengine / bevy

Do you really need to use this TypeIdMap hack? #1097