Open steffahn opened 1 year ago
Ugh, that's unfortunate. I guess FrozenMap would have to use an internal hashmap implementation that stores the hashes or something. (Basically, have CachedHash<T: Hash> {x: T, y: u64}
that hashes to the cached u64)
Would it be possible to get hashbrown to assign a garbage hash to such entries instead? This would still be correct according to the garbage-in-garbage-out pattern of HashMap behavior when the user writes silly implementations of Hash and Eq.
It would be nice if we could rely on the invariant that insert-only maps never drop.
Another way to fix this would be to catch_unwind
I guess; or potentially seed insert
with a destructor bomb. Unsure what the perf implications of that are, and potentially people may not want the abort from the destructor bomb. Thoughts?
Would it be possible to get hashbrown to assign a garbage hash to such entries instead? This would still be correct according to the garbage-in-garbage-out pattern of HashMap behavior when the user writes silly implementations of Hash and Eq.
I agree that this is somewhat surprising behavior from HashMap
, hence it’s be nice if it could be improved on that end.
Also – though I haven’t quite thought through / experimented with what happens if the same thing is forced (through a silly BuildHasher
) on an IndexMap
(which uses HashMap<usize>
internally, storing its keys and values in a separate Vec
) – I believe that IndexMap
most likely does not have the same problem, as there should probably be no way a call to insert
can end up in an element of that Vec
getting dropped.
If only one could somehow safely convert &mut HashMap<K, V>
into &mut HashMap<Newtype<K>, Newtype<V>>
, then it’d pe possible to make sure that during FrozenMap::insert
, the HashMap
-access happens through the &mut HashMap<Newtype<K>, Newtype<V>>
-view where Newtype
implenents Drop
in an aborting manner.
That may be sound for repr(transparent)
I think
Yeah, one of the problems is that we expose methods that give you access to the underlying hashmap. Unfortunate. Not strongly opposed to changing
That may be sound for
repr(transparent)
I think
AFAIK that isn’t the case:
For example, as far as I’m aware…
pub struct HashMap<K, V, S = DefaultHashBuilder, A: Allocator + Clone = Global> {
pub(crate) hash_builder: S,
pub(crate) table: RawTable<(K, V), A>,
}
being repr(Rust)
means that e.g. it’s absolutely legal for the compiler to switch the order of the fields hash_builder
and table
between the type HashMap<K, V>
and HashMap<Newtype<K>, Newtype<V>>
, even if Newtype
is repr(transparent)
.
Actually, one way to fix this would be a destructor guard type in insert
that calls mem::forget(mem::take(self))
or something.
Unless there is a chance that other user code can run between the panicky hash and HashMap::insert()
ending? Looks like that code basically gives up the moment this happens, so it ought to be fine, but I haven't reviewed it.
re: transparent: Yeah, so .... given that transparent()
was introduced for interop with other languages there's a very very good chance that any field reordering will have to avoid it as well (because people are going to be interacting with repr(Rust)
in interop). But yes, that's not settled. Might be a quick RFC to say "repr(transparent) means it behaves identically in all repr contexts, including when used as a field". But you're right that this isn't settled right now.
Actually, one way to fix this would be a destructor guard type in
insert
that callsmem::forget(mem::take(self))
or something.
Usually panics don’t get converted in memory leaks, right? But at least I agree this does sound like it would improve soundness.
Edit: Actually… no. The values in the map get dropped while unwinding, before HashMap::insert
finishes.
Oh, wait, yeah, that wouldn't do anything. Never mind.
By the way, your new title was inaccurate, as the problem only possibly arises on the non-reallocating rehashing ;-)
This also makes this issue harder to run into. E.g. the use-case of starting up with an empty FrozenMap
and only ever adding entries to it can never run into the issue; re-hashing without re-allocating only possibly happens if there are DELETED
entries in the map.
Another way to fix this would be to
catch_unwind
I guess; or potentially seedinsert
with a destructor bomb. Unsure what the perf implications of that are, and potentially people may not want the abort from the destructor bomb. Thoughts?
It should probably be possible to create self-referencing uses of FrozenMap
where a subsequent destructor call within the insert
could already unsoundly access data that was dropped in a previous one… or in itself, for that matter.
That means that at the time of catching or drop-bombing the panic leaving the HashMap::insert
, it’s too late. _(It might still be an improvement to avoid more of the possible unintentionally unsound use-cases.)_
Edit: Here’s a demo:
/*
[dependencies]
elsa = "1.8.1"
thread_local = "1.1.7"
*/
use std::{
collections::HashMap,
sync::{Mutex, OnceLock},
};
use elsa::FrozenMap;
use thread_local::ThreadLocal;
#[derive(PartialEq, Eq, Debug)]
struct H(u32);
impl std::hash::Hash for H {
fn hash<H: std::hash::Hasher>(&self, state: &mut H) {
if PANIC_ON.lock().unwrap().as_ref() == Some(&self.0) {
panic!();
}
0_u32.hash(state);
}
}
struct V(Mutex<Option<&'static str>>);
static PANIC_ON: Mutex<Option<u32>> = Mutex::new(None);
fn main() {
let mut map = HashMap::new();
for i in 1..=28 {
map.insert(H(i), Box::new((String::new(), None)));
}
for i in 1..=27 {
map.remove(&H(i));
}
*map.get_mut(&H(28)).unwrap() = Box::new((
String::from("Hello World!"),
Some(V(Mutex::new(None::<&str>))),
));
let mut map = FrozenMap::from(map);
static MAP: OnceLock<ThreadLocal<FrozenMap<H, Box<(String, Option<V>)>>>> = OnceLock::new();
let map: &'static FrozenMap<_, _> = MAP
.get_or_init(ThreadLocal::new)
.get_or(|| std::mem::take(&mut map));
let last = map.get(&H(28)).unwrap();
let hello_world = &last.0[..];
*last.1.as_ref().unwrap().0.lock().unwrap() = Some(hello_world);
impl Drop for V {
fn drop(&mut self) {
let hello_world = self.0.lock().unwrap().unwrap();
println!("gone: {hello_world}");
}
}
println!("exists: {hello_world}");
*PANIC_ON.lock().unwrap() = Some(28);
map.insert(H(1), Box::new((String::new(), None)));
}
By the way, your new title was inaccurate, as the problem only possibly arises on the non-reallocating rehashing ;-)
Ah, so it's basically only when you construct a FrozenMap out of a HashMap that has experienced deletion. Hm.
Right, or if you modify it using the AsMut
to do the deletions.
heh, so my preferred fix (cache the hash) has the side effect of removing all of the APIs that cause this problem in the first place, nice.
(But i'd really prefer to avoid removing those APIs)
If we hide away the internal implementation, we can harden it with more UnsafeCells if we want so that https://github.com/Manishearth/elsa/issues/50 goes from being "probably not a problem" to "definitely not a problem"
Which data structures are affected by this?
Anything with a hashmap
It’s a bit tricky to reproduce, but here’s a way that seems to reliably do it, as of now:
_([run this code online](https://www.rustexplorer.com/b#%2F*%0A%5Bdependencies%5D%0Aelsa%20%3D%20%221%22%0A*%2F%0A%0Ause%20std%3A%3A%7B%0A%20%20%20%20collections%3A%3AHashMap%2C%0A%20%20%20%20panic%3A%3A%7Bcatch_unwind%2C%20AssertUnwindSafe%7D%2C%0A%20%20%20%20sync%3A%3AMutex%2C%0A%7D%3B%0A%0A%23%5Bderive(PartialEq%2C%20Eq%2C%20Debug)%5D%0Astruct%20H(u32)%3B%0A%0Aimpl%20std%3A%3Ahash%3A%3AHash%20for%20H%20%7B%0A%20%20%20%20fn%20hash%3CH%3A%20std%3A%3Ahash%3A%3AHasher%3E(%26self%2C%20state%3A%20%26mut%20H)%20%7B%0A%20%20%20%20%20%20%20%20if%20PANIC_ON.lock().unwrap().as_ref()%20%3D%3D%20Some(%26self.0)%20%7B%0A%20%20%20%20%20%20%20%20%20%20%20%20panic!()%3B%0A%20%20%20%20%20%20%20%20%7D%0A%20%20%20%20%20%20%20%200_u32.hash(state)%3B%0A%20%20%20%20%7D%0A%7D%0A%0Astatic%20PANIC_ON%3A%20Mutex%3COption%3Cu32%3E%3E%20%3D%20Mutex%3A%3Anew(None)%3B%0A%0Afn%20main()%20%7B%0A%20%20%20%20let%20mut%20map%20%3D%20HashMap%3A%3Anew()%3B%0A%20%20%20%20for%20i%20in%201..%3D28%20%7B%0A%20%20%20%20%20%20%20%20map.insert(H(i)%2C%20String%3A%3Anew())%3B%0A%20%20%20%20%7D%0A%20%20%20%20for%20i%20in%201..%3D27%20%7B%0A%20%20%20%20%20%20%20%20map.remove(%26H(i))%3B%0A%20%20%20%20%7D%0A%20%20%20%20*map.get_mut(%26H(28)).unwrap()%20%3D%20String%3A%3Afrom(%22Hello%20World!%22)%3B%0A%0A%20%20%20%20let%20map%20%3D%20elsa%3A%3AFrozenMap%3A%3Afrom(map)%3B%0A%0A%20%20%20%20let%20hello_world%20%3D%20map.get(%26H(28)).unwrap()%3B%0A%0A%20%20%20%20println!(%22exists%3A%20%7Bhello_world%7D%22)%3B%0A%0A%20%20%20%20let%20_%20%3D%20catch_unwind(AssertUnwindSafe(%7C%7C%20%7B%0A%20%20%20%20%20%20%20%20*PANIC_ON.lock().unwrap()%20%3D%20Some(28)%3B%0A%20%20%20%20%20%20%20%20map.insert(H(1)%2C%20String%3A%3Anew())%3B%0A%20%20%20%20%7D))%3B%0A%0A%20%20%20%20println!(%22gone%3A%20%7Bhello_world%7D%22)%3B%0A%7D%0A) on rustexplorer.com)_
See here and here in the
hashbrown
source-code for more information.