astral-sh / uv

An extremely fast Python package installer and resolver, written in Rust.
https://astral.sh/
Apache License 2.0
11.72k stars 321 forks source link

once-map: avoid hard-coding `Arc` #3242

Closed BurntSushi closed 2 weeks ago

BurntSushi commented 2 weeks ago

The only thing a OnceMap really needs to be able to do with the value is to clone it. All extant uses benefited from having this done for them by automatically wrapping values in an Arc. But this isn't necessarily true for all things. For example, a value might have an Arc internally to making cloning cheap in other contexts, and it doesn't make sense to re-wrap it in an Arc just to use it with a OnceMap. Or alternatively, cloning might just be cheap enough on its own that an Arc isn't worth it.

zanieb commented 2 weeks ago

Thank you!

ibraheemdev commented 2 weeks ago

We could also avoid the Arcs entirely by keeping a DashMap<K, Value<usize>> that indexes into a separate persistent vector once the value has been initialized.

BurntSushi commented 2 weeks ago

@ibraheemdev Do you mean something like this? I don't think I see a straight-forward way to make that work. You still need to synchronize updates to the persistent vector somehow. And that in turn usually makes returning a &T difficult. You could put a Box<Vector> into an AtomicPtr, but when you go to CAS it out with an updated Vector after a done call, I believe that could invalidate previous borrows (thus causing UB).

ibraheemdev commented 2 weeks ago

@BurntSushi I was thinking of using boxcar, which allows concurrent inserts and stable addresses. I don't know enough about how OnceMap is used to know if it's worth the dependency though.

BurntSushi commented 2 weeks ago

Oh a concurrent vector, not a persistent vector. (A persistent vector has different properties. Like cheap clones, structural sharing and the ability to continue using older "versions.") But I see, yes, I think something like that could work. It's a nice data structure. But yeah, I don't think we're feeling the downsides of Arc here yet. I think it's typically the case that the relative work of cloning an Arc is dwarfed by other things (like I/O).