Replace the hash function used in YJIT with a faster implementation

whtsht commented 1 week ago

Issue description:

Replace the hash function used in YJIT from the default SipHash in the standard library to a faster alternative.

Motivation:

YJIT currently uses HashMap and HashSet from the standard library, which utilize SipHash by default. While SipHash is collision-resistant, it is relatively slow. Replacing it with a faster hash function, even at the cost of reduced collision resistance, may improve performance.

Task:

Introduce a faster hash library, such as AHash or FxHash
- https://github.com/rust-lang/hashbrown
- https://github.com/rust-lang/rustc-hash
Set the seed value randomly at initialization to prevent predictable hash collisions and mitigate hash flooding attacks

Concerns:

Switching to a faster, non-collision-resistant hash function may increase the risk of hash flooding attacks. To mitigate this, careful implementation and random seeding will be required.

For example, Node.js faced a hash flooding vulnerability in the past: https://v8.dev/blog/hash-flooding.

Conclusion:

I welcome feedback, especially regarding potential security concerns or implementation suggestions. Thank you.

eregon commented 1 week ago

For reference, the 2 CRuby CVE about HashDoS:

And the move to SipHash-1-3: https://bugs.ruby-lang.org/issues/13017

The second CVE means just MurmurHash + a random seed wasn't enough, at least for hashing keys of user-facing Hash. The link there seems no longer accessible but there is https://web.archive.org/web/20240401163550/2012.appsec-forum.ch/conferences/#c17

At 28c3, Klink and Waelde demonstrated that a number of technologies (PHP, .NET, Ruby, Java, etc.) remained vulnerable to the decade-old hash-flooding DoS attacks. These attacks work by enforcing worst-case insert time in hash tables by sending many inputs hashing to the same value (a “multicollision”). Many vendors fixed the issue by replacing the weak deterministic hash function with stronger and randomized hash functions. In this presentation, we will show examples of such stronger randomized hash functions that fail to protect against hash-flooding, by presenting “universal multicollision” attacks based on differential cryptanalysis techniques. We will present demos showing how to exploit these attacks to DoS a Ruby on Rails application, as well as the latest Java OpenJDK; two technologies that chose to “fix” hash-flooding by using the MurmurHash hash functions. Finally, we will describe a reliable fix to hash-flooding with the SipHash family of pseudorandom functions: SipHash provides the adequate cryptographic strength to mitigate hash-flooding, yet is competitive in performance with the non-cryptographic hashes.

And I think that's why we still have quite expensive hash functions in CRuby these days and not just a random seed.

FWIW Java or at least HotSpot chose to fix differently: java.lang.String#hashCode is still fully deterministic to this day but the hashtables implementations like java.util.HashMap switch internally to a tree-like comparison Map when they detect too many hash collisions.

whtsht commented 1 week ago

Using SipHash seems reasonable.

Many languages use SipHash as a countermeasure against universal multicollisions attacks (Node.js also uses it)
It works fast for small inputs

My research was insufficient. Thank you @eregon for your opinion.

I will close this issue.

Shopify / ruby