gfx-rs / wgpu

A cross-platform, safe, pure-Rust graphics API.
https://wgpu.rs
Apache License 2.0
12.32k stars 906 forks source link

Uses of `PreHashedMap` are probably not sound #6373

Open jimblandy opened 7 hours ago

jimblandy commented 7 hours ago

The PreHashedMap type in wgpu_core is probably not used in a sound way, in particular, its use in Device::derive_pipeline_layout is almost certainly incorrect.

The idea is that a PreHashedMap key is just a copy of the true key's u64 hash value. But this doesn't provide any way to distinguish keys that have the same hash value: PartialEq on a PreHashedKey just compares the cached hash values. This will lead to incorrect lookups and insertions.

(I don't think PreHashedMap makes much sense: it's only sound if you can guarantee a distinct hash value for distinct keys, which means these "hash values" are almost certainly not produced by hashing: you're using some identifier or pointer that was assigned in a way that guarantees uniqueness. In that case, you should simply use that identifier as the key directly.)

jimblandy commented 7 hours ago

Aside: Managing pools of de-duplicated complex values is really tricky, even if you're not playing games with hash values: my friend recently found a year-old race condition in similar code he'd written for wasmtime that led to a CVE.

jimblandy commented 7 hours ago

From chat, @cwfitzgerald says:

We can probably mitigate this by storing the key and the hash, then dispatching eq to the real key, and hash to the precomputed hash

I infer from this that the motivation was to avoid the overhead of re-computing hash values. If we're doing a lot of queries with the same key, and we can keep the PreHashedKey around to reuse for all the queries, then this makes sense.

But it's my understanding that Rust hash tables themselves cache the hash values of the keys already stored in the table. A lookup computes the hash of the key you're querying with, but then it just checks that against the hashes stored in the table, and performs a real PartialEq comparison only when the hashes match. So storing PreHashedKey in the table doesn't make sense; the table already has the hash value cached.

jimblandy commented 7 hours ago

Hmm, I'm reluctant to dive into the hashbrown code, but from the description it seems like they only store one byte of the hash value for each key. But I believe it is the case that they are not re-hashing stored keys at lookup time, which means that storing PreHashedKey in the table isn't avoiding hashes.

cwfitzgerald commented 3 hours ago

Yeah I dove into it a bit too, and yeah I think we should just pull the whole abstraction - I was trying to avoid needing to do large structure comparisons inside a lock, but I don't think that can be reasonably avoided.