Shopify / ruby

The Ruby Programming Language [mirror]
https://www.ruby-lang.org/
Other
41 stars 13 forks source link

Replace the hash function used in YJIT with a faster implementation #561

Closed whtsht closed 1 week ago

whtsht commented 1 week ago

Issue description:

Replace the hash function used in YJIT from the default SipHash in the standard library to a faster alternative.

Motivation:

YJIT currently uses HashMap and HashSet from the standard library, which utilize SipHash by default. While SipHash is collision-resistant, it is relatively slow. Replacing it with a faster hash function, even at the cost of reduced collision resistance, may improve performance.

Task:

Concerns:

Switching to a faster, non-collision-resistant hash function may increase the risk of hash flooding attacks. To mitigate this, careful implementation and random seeding will be required.

For example, Node.js faced a hash flooding vulnerability in the past: https://v8.dev/blog/hash-flooding.

Conclusion:

I welcome feedback, especially regarding potential security concerns or implementation suggestions. Thank you.

eregon commented 1 week ago

For reference, the 2 CRuby CVE about HashDoS:

And the move to SipHash-1-3: https://bugs.ruby-lang.org/issues/13017

The second CVE means just MurmurHash + a random seed wasn't enough, at least for hashing keys of user-facing Hash. The link there seems no longer accessible but there is https://web.archive.org/web/20240401163550/2012.appsec-forum.ch/conferences/#c17

At 28c3, Klink and Waelde demonstrated that a number of technologies (PHP, .NET, Ruby, Java, etc.) remained vulnerable to the decade-old hash-flooding DoS attacks. These attacks work by enforcing worst-case insert time in hash tables by sending many inputs hashing to the same value (a “multicollision”). Many vendors fixed the issue by replacing the weak deterministic hash function with stronger and randomized hash functions. In this presentation, we will show examples of such stronger randomized hash functions that fail to protect against hash-flooding, by presenting “universal multicollision” attacks based on differential cryptanalysis techniques. We will present demos showing how to exploit these attacks to DoS a Ruby on Rails application, as well as the latest Java OpenJDK; two technologies that chose to “fix” hash-flooding by using the MurmurHash hash functions. Finally, we will describe a reliable fix to hash-flooding with the SipHash family of pseudorandom functions: SipHash provides the adequate cryptographic strength to mitigate hash-flooding, yet is competitive in performance with the non-cryptographic hashes.

And I think that's why we still have quite expensive hash functions in CRuby these days and not just a random seed.

FWIW Java or at least HotSpot chose to fix differently: java.lang.String#hashCode is still fully deterministic to this day but the hashtables implementations like java.util.HashMap switch internally to a tree-like comparison Map when they detect too many hash collisions.

whtsht commented 1 week ago

Using SipHash seems reasonable.

My research was insufficient. Thank you @eregon for your opinion.

I will close this issue.