Open plajjan opened 1 year ago
There is one obvious candidate: siphash. It is widely used and developed by people with a very good reputation. For proper DoS protection, it should be combined with a truly random key, generated from a random source at program startup. This also requires that we rethink serialization of dictionaries and sets; we will need to store a list of key/value pairs and a list of elements, respectively, and rebuild at deserialization. We cannot just memcopy the content of the hash table, since the hash function will have a new key the next time we run the program.
I missed your link to the comment on siphash and Rust. Yes, my understanding is that siphash is a bit slow, in particular on long strings, which I doubt is our typical use case. But my understanding is not very well-founded...
I think .NET is using xxHash and people are talking about it as a very fast hash algorithm: https://github.com/Cyan4973/xxHash
Is this idea of using one hash and switching to a more expensive one feasible? Like could we switch? How would we switch? I imagine we would want to decide this per table/dict, right? So fort most places we'd need a hash we can run with the fast one but if we notice lots of collisions in some dict we can switch to another?
I'm really not used to thinking about this problem so maybe I'm going about it all wrong.
If we have to pick one, I agree that the safer siphash is a better choice.
We should have a proper algorithm for hashing of ints and other things. We should probably aim for something DoS resistant too.
Conversation started in https://github.com/actonlang/acton/pull/992#issuecomment-1292166708:
@plajjan found this the other day: https://twitter.com/pcwalton/status/1583931446305038336
So .NET appears to switch from a simple & fast hashing algorithmm to something DoS resistant. Can we do the same?