dtolnay / linkme

Safe cross-platform linker shenanigans
Apache License 2.0
626 stars 42 forks source link

Proposal: add a `#[disjointed_static]` attribute #81

Open CAD97 opened 7 months ago

CAD97 commented 7 months ago

Declaring a static to be defined downstream seems right in the "linker shenanigans" alley of the linkme crate, and can be made (mostly) typesafe using essentially the same linker techniques as with #[distributed_slice].

I'm most interested in the static RESOURCE: dyn Trait case (c.f. #[distributed_slice] using static SLICE: [T]), but any concrete type that can be put in a static could of course be used.

If you agree that this would be a reasonable fit for the linkme crate to provide, I'm willing to work on getting a working implementation.

Example

// upstream
pub trait Log {
    fn record(&self, entry: &Record<'_>);
    fn flush(&self);
}

#[global_resource]
pub static GLOBAL_LOGGER: dyn Log;

// downstream
#[global_resource(GLOBAL_LOGGER)]
static LOGGER: MyLogger = MyLogger::new();

Implementation strategy

There are two main potential implementation strategies. The first is to use a DistributedSlice<&dyn Log> with a runtime assert_eq!(len(), 1) (like the distributed slice dupcheck). Reusing the #[distibuted_slice] machinery is a reason for putting this in linkme.

The second one is to use a single symbol instead, e.g.

// upstream
static GLOBAL_LOGGER: GlobalResource<dyn Log> = {
    extern {
        #[link_name = "__linkme.resource.GLOBAL_LOGGER"]
        static LINKME: &dyn Log;
    }
    GlobalResource::private_new(LINKME, todo!())
};

// downstream
static LOGGER: MyLogger = MyLogger::new();
const _: () = {
    #[link_name = "__linkme.resource.GLOBAL_LOGGER"]
    static LINKME: &dyn Log = &LOGGER;
};

(attributes aren't exactly correct). An advantage of this approach is that it (potentially) works under Miri, but a big disadvantage is that it relies on linker errors for missing or duplicated definition of the linked symbol. Using weak symbols might be able to avoid this.

As a variation, instead of linking static: &dyn Log, link fn() -> &dyn Log instead. That could be more well supported and allows the downstream to more directly do lazy initialization, which could be an upside or drawback, depending on what you want.

In either case, the ability to provide a default implementation is useful (e.g. how #[global_allocator] defaults to as-if Global is used) and would theoretically be possible: a strongly ordered slice member for the former implementation strategy, and weak symbol definition for the latter.

dtolnay commented 7 months ago

I am on board with the single symbol implementation strategy. I would accept a PR for this. Thanks! Nice idea.

For the distributed_slice implementation strategy, I think just recommending that someone use distibuted_slice (or their own abstraction around it) if they want that behavior is fine.

For a default value: there is no way to do weak symbols in stable Rust, right? I found #[linkage = "extern_weak"] but that is unstable (https://github.com/rust-lang/rust/issues/29603). I think it's fine to not support default value and instead recommend distributed_slice for that use case.

Naming:

CAD97 commented 7 months ago

I think I'd lean towards something along the lines of #[disjointed_static], personally, rather than singleton. "Disjointed" (static) also feels like a playful relative of "distributed" (slice).

Other potentially useful terms for naming:

Re weak symbols: there's always global_asm!. That part was more about potential than immediate practicality anyway.

The main implementation question, then, is whether the linked symbol should be T or &T (or fn() -> &T). The declaration static needs to indirectly use the implementation static in order to be safe to consume (and to implement the duplicate declaration check). Barring further benefits for further indirection, though, I think it's better to directly link the implementation T to the declaration, even though that makes the dyn Trait use case a bit less convenient.

As an aside, note that a single symbol based solution does implicitly rely on attempting to reference an undefined symbol being an error (and not just e.g. dangling into who-knows-where) for soundness. This is always the case for static linking of the symbol as far as I'm aware, but I don't know about the corner cases of dynamic linking.

I should have some time to get my sketch functional sometime next week, depending on how other projects go. (I've false started basically this concept multiple times before the thought for it to live in linkme.)