Closed phip1611 closed 1 day ago
Hey,
Perfect hashes don't really make sense to be used as hashers for a standard hashmap. Unlike normal hashes, constructing a perfect hash function requires knowing up front which keys you'll insert. And given a perfect hash function, there is typically no need for the hashmap to store keys. But that makes it impossible to detect duplicate or unexpected keys. So the typical case is to simply take the integer returned by the minimal perfect hash function and then to store an array that is directly indexed by this integer.
Hope this helps! I'll leave this open to update the readme later.
Got it, thanks!
And given a perfect hash function, there is typically no need for the hashmap to store keys.
I think I figured out how to use it. Is this correct?
use ptr_hash::{PtrHash, PtrHashParams};
#[derive(Debug, Default, Clone)]
struct AggregatedData {
name: String,
sum_values: f32,
}
impl AggregatedData {
fn init(&mut self, name: String) {
self.name = name;
}
}
fn main() {
let cities = ["Berlin", "Dresden", "Hamburg"];
// Build the data structure.
let mphf = <PtrHash<_>>::new(cities.as_slice(), PtrHashParams::default());
// Map-like vector. Indices map to elements.
let mut map = vec![AggregatedData::default(); cities.len()];
// Init all entries with data that is already known.
for city in cities {
let index = mphf.index_minimal(&city);
map[index].init(city.to_string());
}
let index = mphf.index_minimal(&"Berlin");
map[index].sum_values += 5.0;
let index = mphf.index_minimal(&"Dresden");
map[index].sum_values += 1.0;
let index = mphf.index_minimal(&"Hamburg");
map[index].sum_values += -5.0;
let index = mphf.index_minimal(&"Hamburg");
map[index].sum_values += -2.0;
dbg!(map);
}
Yes, that looks like how I'm using it. Just note that you could get away with not storing the name
of the city in the AggregateData
struct.
Hi there! I've come here because I found its usage in [0]. I'm not an expert in hashing and hashmaps. I'm wondering if this perfect hashing algorithm can be combined with
std::collections::HashMap
orstd::collections::BTreeMap
?My "request"/suggestion is: Can you please add a few sentences in the README and documentation that classify this library against the standard library and the hashing implementations used there, including an example? This will help to understand people outside the loop what this actually is. :)
In the past days, I've jumped into the 1BRC challenge w/ Rust myself [1], that's how I ended up here (jup, I'm late to the party).
[0] https://github.com/RagnarGrootKoerkamp/1brc/blob/master/src/main.rs#L390 [1] https://github.com/phip1611/1brc-rust