nical commented 5 months ago

I'll use this issue to explore various ways we could make wgpu-core less generic.

The motivations include reducing the clutter of overly generic code and more importantly to reduce binary size, compile times.

Factor out non-generic code into separate functions

There is some amount of code that could be easily factored out, for example some of the validation. Isolating the validation also has the advantage that it can be laid out to match the specification so that we can more easily check that the validation is correct. There is only a limited amount of bang for buck we can get

Making the resources not generic

We'll get the biggest wins by far if we can erase the type of the resources. most of wgpu-core is generic only because something needs to hold on to a generic resource.

Downcasting

Assuming we can accept the cost of downcasting every resource at the top of every hal functions, we could use trait objects and downcast via Any.

// in wgpu-hal:

/// Implemented by most hal types
pub trait Resource: 'static {
    fn as_any(&self) -> &dyn std::any::Any;
}

// Helper for downcasting. Failure to downcast is always an internal error
pub fn downcast<T:'static>(resource: &dyn Resource) -> &T {
    resource.as_any().downcast_ref::<T>().unwrap()
}

The gpu-hal APIs would take &dyn Resource parameters instead of some associated types, and the first thing they would do is downcast these dyn traits into the concrete types they are expecting.

Note that we call also break it down into plenty of traits like TextureResource, BufferResource, etc. That's a rather small detail once we have overcome the other obstacles.

Boxed resources

The simple approach (which I find very unsatisfying) is to box all hal resources:

// in wgpu-core:

pub struct BindGroupLayout {
    // ...

    raw: Box<dyn hal::Resource>,
}

The wgpu-core and wgpu-hal structures are no longer next to each other in memory, that most likely has a cost but it is hard to estimate it without doing all of the work.

Dynamically sized types

The last member of a struct can be a DST, which turns the structure into a dst as well:

pub struct BindGroupLayoutResource<Hal: ?Sized> {
    // ...
    pub raw: Hal
}

// The alias that is used in most places
pub type BindGroupLayout = BindGroupLayoutResource<dyn hal::Resource>;

pub fn create_bind_group_layout(device: &Device) -> Result<Arc<BindGroupLayout>, Error> {
    let typed_bgl = Arc::new(BindGroupLayoutResource {
        label: String::new(),
        // TODO: Here we need the concrete hal structure before we can erase its type. Ideally
        // we would like device_create_* functions to not be generic (to allow out of tree backends).
        raw: hal::vulkan::BindGroupLayout {

        },
    });

    // BindGroupLayoutResource<hal::BindGroupLayout> can be converted into a BindGroupLayoutReosurce<dyn hal::Resource>
    Ok(typed_bgl)
}

It's quite nice, however there are a few constraints:

There are some restrictions to how DSTs are stored, for example you can't do Option<dyn hal::Resource>
to place the resource into the structure that contains it, you need to know the concrete type. In other words, the device_create_* functions would probably need to still be generic.
- If we manage to make the generics completely disappear, then out-of-tree backends become very easy to add.
- Note that the boxed resources solution does not suffer from this limitation.

Code of the DST experiment: https://gist.github.com/nical/c5d88aaf97f20815756a36dc5c94b5a3 it also shows how snatchable resources would work.

Can we do it without Any/downcast?

Maybe, with a copious amount of unsafe code which might be fine. Suggestions are welcome!

udoprog commented 5 months ago

Note that we call also break it down into plenty of traits like TextureResource, BufferResource, etc. That's a rather small detail once we have overcome the other obstacles.

I think it might be a bit understated, but breaking things up (not necessarily into traits) by making some source type visible could make downcasts less error (or panic) prone.

By for example, limiting legal downcasting through a trait:

enum Vulkan {}
Impl ApiMarker for Vulkan {}

trait ApiMarker;

trait DowncastRef<A: ApiMarker, T> {
    fn downcast_ref(&T) -> &Self;
}

impl DowncastRef<Vulkan, TextureView> for VulkanTextureView {
    fn downcast_ref(resource: &TextureView) -> &Self {
        // get the resource
    }
}

DowncastRef could even be an unsafe trait and do the same kind of pointer cast we do in wgpu today between contexts. The implementer of the trait would have to ensure it's correct, not every use of the downcast.

udoprog commented 5 months ago

Note that this could all be bundled into another Resource-like trait that we could insist all backend-specific types implement. I think there are more things we might need anyway. Like the ability to extract a global identifier unless we want to deprecate that API.

Exactly how it would have to fit together is tricky.

nical commented 5 months ago

I think it might be a bit understated, but breaking things up (not necessarily into traits) by making some source type visible could make downcasts less error (or panic) prone.

I agree, I meant that it's an understood part of the design space, while right now I'm not sure how feasible it is to erase all of the hal types without doubling the amount of allocations and indirection.

udoprog commented 5 months ago

FYI, I'm investigating whether this technique can be used (or be made) sound in Rust, this would mean that smart pointers could be stored internally as their hal types in wgpu-core, but when shared with types in wgpu (or elsewhere) they could be accessed through trait objects.

This could potentially reduce the number of necessary downcasts, and the representation of a concrete type is more efficient and can access private methods not exported through a public trait. In effect, something like Arc<dyn Surface> and Arc<A::Surface> has the same representation, so coercion between one and the other might be possible.

nical commented 5 months ago

One of the questions to consider is at which layer of wgpu's architecture should we have the dyn traits.

A) wgpu-hal's `HalApi.
B) Something in wgpu-core implements dynamic dispatch on top of hal's static-only dispatch.
C) wgpu-core's API itself remains generic but internally some types are erased to remove generics from parts of the code.

I have a preference towards A because it removes the need for an extra interface layer. In a potential future where webrender uses wgp-hal directly, we don't need another solution to dynamic dispatch. It could in the long run lead to a mostly generic-free wgpu-core. If we push this far enough, we can also implement out-of-tree backends via that interface without requiring them to go through yet-another abstraction. The tricky part about this one is how do we create the hal resources (without necessarily boxing them).

B is the "let's not touch wgpu-hal" approach but I don't really see an advantage to it unless we don't find a solution for how to create the resource types in A.

I see C mostly as a step towards A or B.

udoprog commented 5 months ago

Making wgpu-hal dynamic is a lot less obvious to me than doing so for wgpu-core. I believe the latter is the better gradual approach and once identifiers are no more we can re-evaluate. Furthermore wgpu-core will currently need to export an API which is fairly specialized to it, so regardless I don't think there's any getting away from it exporting its own traits for behavior without a significant rework of its API.

I think what makes this less obvious stems from the fact that wgpu-hal was never really designed to be dispatched over dynamically. wgpu-core has the gfx_select! mechanism which restricts how much generics could be used in practice, So almost everything you find in wgpu-hal in contrast is associated types and generic parameters that would have to be dissolved.

nical commented 5 months ago

You are arguing at a fairly abstract and conceptual level. I had an equally abstract counter-argument but all that it shows is different view points and philosophies. Instead, let's come up with concrete (skeletons of) solutions, compare them and once we have defined our goal, we can break it down into its incremental steps.

udoprog commented 5 months ago

Feel free to shoot the abstract counter argument, but since you felt mine were too abstract, I'll at least try to rephrase them here more concretely:

wgpu-core and wgpu-hal are very different in how hard they are to make object safe

This boils down to is that a method in wgpu-core tends to look like this:

pub fn texture_create_view<A: HalApi>(
    &self,
    texture_id: id::TextureId,
    desc: &resource::TextureViewDescriptor,
    id_in: Option<id::TextureViewId>,
) -> (id::TextureViewId, Option<resource::CreateTextureViewError>);

And the corresponding wgpu-hal methods tend to look like this:

unsafe fn create_texture_view(
    &self,
    texture: &A::Texture,
    desc: &TextureViewDescriptor,
) -> Result<A::TextureView, DeviceError>;

The associated types here are A::Texture and A::TextureView, and can't be present during dynamic dispatch, so for your alternative A (as I've understood it) they'd have to be replaced with dynamic objects like Arc<dyn Texture> and Arc<dyn TextureView> - this is the type change that I think would require essentially a complete rewrite of wgpu-hal, which affects every downstream use of it.

In comparison, wgpu-core methods are already pretty much object-safe if we monomorphise them over A: HalApi (like I've proposed here).

wgpu-core has its own high level behavior and will most likely require its own traits anyway

The second aspect is that a wgpu-core Texture and a wgpu-hal Texture does very different things, you can see this reflected in the wrapper type which is defined in wgpu-core:

pub struct Texture<A: HalApi> {
    pub(crate) inner: Snatchable<TextureInner<A>>,
    pub(crate) device: Arc<Device<A>>,
    pub(crate) desc: wgt::TextureDescriptor<(), Vec<wgt::TextureFormat>>,
    pub(crate) hal_usage: hal::TextureUses,
    pub(crate) format_features: wgt::TextureFormatFeatures,
    pub(crate) initialization_status: RwLock<TextureInitTracker>,
    pub(crate) full_range: TextureSelector,
    pub(crate) info: ResourceInfo<Texture<A>>,
    pub(crate) clear_mode: RwLock<TextureClearMode<A>>,
    pub(crate) views: Mutex<Vec<Weak<TextureView<A>>>>,
    pub(crate) bind_groups: Mutex<Vec<Weak<BindGroup<A>>>>,
}

So at least initially, I'd expect that we have to wrap wgpu-hal types to provide the same level of functionality, which means that wgpu-core would have to define a dynamic dyn Texture type anyway. To boil it down, wgpu-hal tends to be responsible for low level behavior like holding onto raw handles and the resources necessary to perform API calls and wgpu-core the high level stuff like if the texture has been consumed or which bind groups its used in.

nical commented 5 months ago

Thanks.

The associated types here are A::Texture and A::TextureView, and can't be present during dynamic dispatch, so for your alternative A (as I've understood it) they'd have to be replaced with dynamic objects like Arc and Arc

They would be replaced with references like &dyn Texture instead of arcs but yes that's the general idea. For everything except the create_* functions, this change pretty simple and mechanical. I put together a quick and dirty experiment in a few minutes.

So it goes back to the problem of how to create the resources. When hal creates a buffer, the conventional ways to return it are either as a concrete type (which requires the function to be generic), or via a box or some other type of pointer to a trait object (in which case wgpu-core loses the ability to easily place the hal resource in the same allocation as the wgpu-core resource).

Taking the example of wgpu-core's Texture with the direction that I am exploring at the moment would look something like this:

// most of wgpu-core only deals with `Arc<Texture>` for which the backend is erased.
pub type Texture = TextureResource<dyn SnatchableResource>

pub struct TextureResource<A: SnatchableResource> {
    pub(crate) raw: A,
    // Members below, are unchanged.
    pub(crate) device: Arc<Device<A>>,
    pub(crate) desc: wgt::TextureDescriptor<(), Vec<wgt::TextureFormat>>,
    pub(crate) hal_usage: hal::TextureUses,
    pub(crate) format_features: wgt::TextureFormatFeatures,
    pub(crate) initialization_status: RwLock<TextureInitTracker>,
    pub(crate) full_range: TextureSelector,
    pub(crate) info: ResourceInfo<Texture<A>>,
    pub(crate) clear_mode: RwLock<TextureClearMode<A>>,
    pub(crate) views: Mutex<Vec<Weak<TextureView<A>>>>,
    pub(crate) bind_groups: Mutex<Vec<Weak<BindGroup<A>>>>,
}

The Snippet from my earlier comment shows how Snatchable Resource would work.

this is the type change that I think would require essentially a complete rewrite of wgpu-hal

Lets elaborate on this, since I certainly don't want to see a rewrite of wgpu-hal. The changes I have in mind are pretty superficial and mechanical (See the quick and dirty experiment I just linked).

The key missing piece is the creation of the resources.

if wgpu-hal returns a Box<dyn TextureResource>.

unsafe fn create_texture_view(
    &self,
    texture: &dyn TextureResource
    desc: &TextureViewDescriptor,
) -> Result<Box<dyn TextureViewResource>, DeviceError>;

Problem solved. It's a rather simple refactoring to do, we can easily do it incrementally. In my TextureView definition above you can remove the generic parameter and replace raw with: raw: Box<dyn hal::TextureResource>,.

That's at the cost of an extra allocation and pointer chase for each resource. I'm not very satisfied but it's not completely unreasonable.

If wgpu-hal has generic create_* functions

unsafe fn create_texture_view(
    &self,
    texture: &dyn TextureResource
    desc: &TextureViewDescriptor,
) -> Result<A::TextureView, DeviceError>;

Then the generics propagate into wgpu-core. How can we contain it? Is it a separate dispatch mechanism specifically for the create functions which encapsulates a bit of wgpu-core side generic code to create the concrete TextureResource<A::Texture> and coerce it into a TextureResource<dyn hal::TextureResource>?

Some quirky placement API that can be made not too footgun-prone?

unsafe fn create_texture_view(
    &self,
    texture: &dyn TextureResource
    desc: &TextureViewDescriptor,
    output: SomethingRepresentingTheMemoryToPlaceTheTextureInto
) -> Result<(), DeviceError>;

udoprog commented 5 months ago

Lets elaborate on this, since I certainly don't want to see a rewrite of wgpu-hal.

FWIW, changing the signature of every function and every use of A::* associated types both in wgpu-core and wgpu-hal and everything that depends on wgpu-hal is what I was referring to as a rewrite. There are no particular negative connotations associated with it, here it just means that "almost everything regarding the API will change".

nical commented 5 months ago

Ok, then I think that the churn is acceptable. It mostly touches function the signatures at the API boundary, it's easy to do and can be done piecemeal. Removing generics from large parts of wgpu-core is going to be a lot more invasive. Note that these wgpu-hal changes (putting aside the create_* functions), don't even require changes to wgpu-core.

udoprog commented 5 months ago

I noticed that you're not removing the Buffer associated type nor the related use of A: HalApi in wgpu-core, do you intend to keep these?

cwfitzgerald commented 5 months ago

I just want to generally express my interest in the wgpu-hal way, assuming performance is acceptable. I didn't realize that was even possible, let alone low lift enough to not be a major problem.

nical commented 5 months ago

I noticed that you're not removing the Buffer associated type nor the related use of A: HalApi in wgpu-core, do you intend to keep these?

Ideally these would go away, it depends on how the create_ functions are handled in the final solution. That's the big question mark.

I'm going to keep throwing ideas without trying to filter out the bad ones. Here's my latest thought:

If we care about the wgpu-core and wgpu-hal types sitting in the next allocation, we could pass a custom allocator to the wgpu-hal create functions. wgpu-core could allocate a generous amount of space for the hal type in the same allocation as the core type, and sub-allocate the hal type from there. wgpu-hal Functions like Device::create_texture could return a Result<Box<dyn TextureResource, InPlaceAllocator>, Error>. Custom Allocators are not stable in rust but we could duplicate the implementation of Box or add a dependency on the allocator-api2 crate.

What's nice about this is that if it turns out that the extra allocation does not matter, rolling back to just returning a normal Box using the global allocator is trivial. We haven't cornered ourselves with a solution that is difficult to remove.

cwfitzgerald commented 5 months ago

What's nice about this is that if it turns out that the extra allocation does not matter, rolling back to just returning a normal Box using the global allocator is trivial. We haven't cornered ourselves with a solution that is difficult to remove.

I think this is better thought of the other way around. If the simple solution (just box the thing) turns out to be bad for performance, we can figure out a better way without breaking too much.

udoprog commented 5 months ago

Ideally these would go away

Isn't it necessary for them to go away to achieve the stated goals? Otherwise I don't see how we're reducing code generation which is what would contribute to reduced binary sizes and reduced compile times. Most of the duplicate monomorphization would be in the wgpu-core to wgpu-hal glue where each type and method that has a A: HalApi parameter is instantiated for every backend.

udoprog commented 5 months ago

To clarify, here's a summary of what produces the most LLVM right now on my system configuration when I build wgpu in release mode.

We can see that the more heavily dominating sources are wgpu-core, where there's 3 copies of most methods taking <A> because I have three backends enabled (vulkan, dx12, and gles). At ~40 we see the first contribution from wgpu-hal, but that isn't a method that would be impacted by the proposed change, it's transition_textures which gets a lot of copies because it takes a generic iterator (we could and should definitely fix that).

nical commented 5 months ago

Isn't it necessary for them to go away to achieve the stated goals?

Kind of yes, but the object-safe and non-object-safe parts of hal's API could be split into two traits. Here is an example which is a direct continuation of the gist from my first comment:

In wgpu-hal the device API is split into two traits (I gave them terrible names so don't think about them too much):

    /// Object safe trait.
    pub trait DeviceApi {
        fn buffer_map(&self, buffer: &dyn Resource) -> Result<(), Error>;
        // etc.
    }

    /// Parts of the API that we couldn't make object-safe (mostly the create_* functions) are split into this "Factory" trait.
    pub trait HalDeviceFactory {
        type Texture: Resource;

        fn create_hal_texture(&self, desc: &TextureDescriptor) -> Result<Self::Texture, Error>;
        // etc.
    }

In wgpu-core, an object-safe trait creates the type-erased objects we want. The implementation knows about the concrete hal types so it can place them in the core resource struct, but it doesn't have to do anything else so the generics are contained into a small part of the code:

    /// The object-safe equivalent to `HalDeviceFactory`
    pub trait CoreDeviceFactory {
        fn create_texture(&self, desc: &hal::TextureDescriptor) -> Result<Arc<Texture>, Error>;
    }

    // A blanket impl is enough to do what we need:
    impl<T> CoreDeviceFactory for T where T: hal::HalDeviceFactory {
        fn create_texture(&self, desc: &hal::TextureDescriptor, label: String, etc: Stuff) -> Result<Arc<Texture>, Error> {
            let typed_texture = Arc::new(TextureResource {
                label,
                raw: Snatchable::new(self.create_hal_texture(desc)?),
            });

            Ok(typed_texture)    
        }
    }

Updated gist: https://gist.github.com/nical/c5d88aaf97f20815756a36dc5c94b5a3/999fb470cc2037ad6340f4bbc1a2fe666ab1e1e0

udoprog commented 5 months ago

In wgpu-core, an object-safe trait creates the type-erased objects we want. The implementation knows about the concrete hal types so it can place them in the core resource struct, but it doesn't have to do anything else so the generics are contained into a small part of the code.

Ok, this is pretty much what I want to do. To me that is essentially what doing "wgpu-core first" means, because it doesn't actually require changing anything in wgpu-hal. Once this is done, we'd have capitalized on most of the available gains, and can revisit what actually needs to be fixed in wgpu-hal to further reduce duplicate code generation.

nical commented 5 months ago

To clarify, here's a summary of what produces the most LLVM right now on my system configuration when I build wgpu in release mode.

Yes, that's a big part of my motivation to put the dynamic dispatch closer to hal. In an ideal world wgpu-core has (almost) no code generic on the backend and close to none of its generated code is duplicated due the the existence of multiple backends.

udoprog commented 5 months ago

I'll leave you to fiddle a bit more, once you have a complete example I can look at it further if you want me to. The tool I've used to test changes so far is cargo llvm-lines.

nical commented 5 months ago

Here is a rough plan:

[ ] Introduce an object-safe version of the Device trait in wgpu-hal. Let's call it DynDevice.
- Its methods are automatically implemented by implementors of the non-object-safe Device trait via a blanket impl.
- It does not need to implement the create_ and destroy functions for now or it could provide ones that Box the resources for completeness but aren't going to be used by wgpu-core.
[ ] In wgpu-core, introduce an object-safe trait responsible for creating a destroying the resources, let's call it DeviceFactory. It is also implemented automatically via a blanket impl.
- The wgpu-core resources are implemented following the pattern described in the gist where a resource is first instanciated as Arc<FooResource<HalDevice::Foo>> with the associated type then cast into a DSTwith the backend type erased in the form Arc<FooResource<dyn hal::Foo>>.
- Having the trait implemented in core lets us place the hal resource into the wgpu-core one without boxing.
[ ] Gradually move wgpu-core's usage of resources to the type erased ones.
[ ] Once the transition is done, most of wgpu-core's code shouldn't have a generic backend parameter.
[ ] Once the transition is done, most or all usage of gfx_select should has become obsolete.
[ ] Measure the performance impact in the render pass code. If it's significant, try a few things:
- @udoprog's downcasting inquiry if it isn't already how downcast is implemented.
- make the run_render_pass_impl generic again and wrap it into an object-safe interface in the DynDevice trait.

nical commented 5 months ago

The plan did not survive first contact with the text editor. The problem is that because the existing (non-object-safe) traits in wgpu-hal are generic (for example: Device<A: Api>), a blanket impl cannot be made from them.

// Error: Unconstrained parameter A.
impl <A, T: Device<A>> DeviceFactory for T {
    // ...
}

That said, if we wrap the Device<A> trait in a struct, we should be able to implement the object-safe trait for the wrapper so not all is lost.

nical commented 5 months ago

Actually, I'm not sure a wrapper would help at all. Another approach is to take the generic parameter out of the hal traits and turn them into associated types:

pub trait Device: WasmNotSendSync {
    type: A: Api;

    // If only the following syntax worked, unfortunately we get an ambiguous associated type error.
    // unsafe fn create_buffer(&self, desc: &BufferDescriptor) -> Result<Self::A::Buffer, Error>;
    // This works but is an ergonomics nightmare.
    unsafe fn create_buffer(&self, desc: &BufferDescriptor) -> Result<<Self::A as Api>::Buffer, Error>;
}

impl<T: Device> DeviceFactory for T {
    unsafe fn dyn_create_buffer(&self, desc: &BufferDescriptor) -> Result<Box<dyn BufferResource>, Error> {
    }
}

That's a huge bummer, because even if the end state is that the <Self::A as Api>::Buffer ends up contained in just a few places (the hal trait definitions and their wrappers), getting there incrementally means core has to be filled with this generic gymnastics before it can be cleaned up.

lylythechosenone commented 3 months ago

I think that if this is the wall that's holding us back, we should just go ahead. A bit of temporary ugly code is a fair trade for our end goal.

@nical do you have a branch where you were working on this? I understand you've been a bit busy, so I thought I could take a crack at it.

nical commented 3 months ago

@lylythechosenone the little I have on branches is probably easier to re-write than rebase. I had started experimenting on this one and this one but these are more exploratory than actual implementations.

I'm glad that you are motivated to pick this up, since I can't dedicate time to this in the foreseeable future. I thought I had some notes about this but unfortunately I'm either misremembering or I lost them so I'll try to page the plan back in my head:

It is very important to do this incrementally. It's going to be a difficult change to land, it will touch most of wgpu-core and will conflict with most PRs happening there so landing changes often and in small chunks is key. I encourage you to experiment on a throwaway branch and re-implement cleaner/isolated commits once you have identified what sticks.

Probably a good first thing to try is what I suggested in my previous comment about moving the generic parameters to associated types. I was worried that it would make things messy but Connor later told me it'd probably be fine. If that doesn't work out we can implement the object-safe version for each backend instead of deriving it from the non-object safe ones.

There will probably be parts of the API that will need to be split off into traits implemented in wgpu-core, for example creating the resources needs to know about the concrete type (for example wpgu_core::BindGroupResource<hal::BindGroup>>).

Don't hesitate to reach out with questions or to bounce ideas, and thanks again for looking into this.

lylythechosenone commented 3 months ago

I am happy to report that converting all Foo<A> traits to Foo<A = ...> traits was successful. Now I just need to refactor all of the backends, and I can start on an actual dynamicification.

lylythechosenone commented 3 months ago

Alright, I have done a bit of work on the dynamic traits. They are mostly implemented, however a few functions are unimplemented due to lack of dynamic forms for:

BuildAccelerationStructureDescriptor
ComputePassDescriptor
BufferBinding
RenderPassDescriptor
TextureBarrier
BufferBarrier
GetAccelerationStructureBuildSizesDescriptor

Currently, these structs all follow a very similar pattern:

struct Foo<'a, A: Api> {
    thing: &'a A::Thing,
    // some metadata
    bar: Bar,
    baz: Baz,
}

We could make dynamic forms of them in one of two ways. Firstly, we could duplicate them and create specific dynamic forms, e.g.:

struct Foo<'a> {
    thing: &'a dyn Thing,
    // some metadata
    bar: Bar,
    baz: Baz,
}

Alternatively, we could adapt the existing ones like so:

struct Foo<'a, T: Thing + ?Sized> {
    thing: &'a T,
    // some metadata
    bar: Bar,
    baz: Baz,
}

The first allows us to not make any more breaking changes. However, the second could be more intuitive. Please tell me your thoughts, and let me know if you think of a third option.

Wumpf commented 3 months ago

It's really tempting to go with thing: &'a dyn Thing right away, but I'm wary of too much duplication going in even if it's supposed to be temporary - there's always the risk that through some unforeseen circumstances things stay in their temporary state and then we end up with an overall confusing construct. So I would prefer the thing: &'a T, interim variant even though it may mean more code churn overall simply because if the effort gets stuck this is imho a way less bad state to get stuck on (please please don't get stuck either way 😄 )

lylythechosenone commented 3 months ago

@Wumpf Actually, I had intended for this to be one of the permanent changes. This is still on the wgpu-hal side of things. If you have another solution, please do tell.

Wumpf commented 3 months ago

we talked it out a bit on Matrix chat here. I misunderstood proposal 2 (no dyn) to be transitional since I assumed we want to end up at proposal 1 (&dyn) eventually.. which isn't that clear! We eventually landed that proposal 2 (no dyns) is where we want to be since it's easier to use & less invasive (less casting), also likely not much extra codegen since it's only used by wgpu-core.

lylythechosenone commented 3 months ago

Alright, it is now mostly done, but there is one pretty big problem. Unlike my first PR, this one will break users and core. I'll have to do the fixes in core before I submit the PR.

lylythechosenone commented 3 months ago

Here's a big question we need to answer: There are a few types that are not needed by the dynamic traits, but follow this same pattern. Should I transition these as well, or keep them taking in Api?

EDIT: they'll be needed by dynamic traits in wgpu-core, so I'll go ahead and transition them.

Wumpf commented 3 months ago

Per more Matrix chat: Turns out the conversion from some untyped to typed structures can be fairly heavy: E.g. BuildAccelerationStructureDescriptor needs to recreate all its vecs, casting every single objects. Note that we can not simply do a saftey-check + transmute since &T and &dyn SomeTrait are completely different pointers. Overall this doesn't seem to be a huge blocker but it might be that we need to think about ways to optimize this at some later point.

lylythechosenone commented 3 months ago

This is also true of the &[&CommandBuffer] and related things on Queue

lylythechosenone commented 3 months ago

A SmallVec could probably work on &[&CommandBuffer] and &[&SurfaceTexture], as they're unlikely to have more than a few.

lylythechosenone commented 3 months ago

Acceleration structure-related downcasts cannot be done currently. They take a reference to their entries. I may have to do some unsafe here and store the typeid outside of the types. For now, I am leaving them todo!, because they don't exist in core anyway.

lylythechosenone commented 3 months ago

Alright, that's everything that needs to be done in wgpu-hal. I'm going to need a lot of help with core, because I've never touched the codebase before. I'm also not entirely sure what the structure should be. I know we need some "factory" traits for creation of objects, and these should be dynamic. They should create core resources. However, that's about where my knowledge ends. @nical it'd be helpful if you could fill me in on whatever more you know, if you do.

lylythechosenone commented 3 months ago

I did some looking around, and I think I get the basic idea now. I'd still love a more thorough explanation.

lylythechosenone commented 3 months ago

What I'm currently thinking (in order of implementation steps):

Core resource types will remain generic, but will allow unsized values
- I'm not 100% sure about how unsizing will work. Currently they take in an Api, and I think only the generic itself can unsize, but I'll have to check on that I guess.
Hubs will transition to storing a HashMap with the TypeId of the Api implementor (give other ideas if you have any), and each Hub will store dynamic-dispatch resources.
- TypeId may not be the best option. Maybe we want some other backend ID method? Id does already exist, and we might be able to use that, but I'm not sure.
Instance will have a similar transition
All of the current hal backends can stay in core as default backends, and the Backend enum can remain, with an Other option. Maybe this could replace Empty. When an instance is created, it will use these default backends, and they will then be unsized into the Hubs. Once an Instance exists, the concrete type is no longer needed. Dynamic dispatch will handle the rest.

This will also require wrapper traits for all of the hal traits, although I'm not sure where that should fit in. These wrappers will build off of the dynamic versions, but re-implement create_ and destroy_ methods in terms of resources.

We will also need a ?Sized Snatchable, which I have already implemented using a tad bit of unsafe, similar to the following:

struct Snatchable<T: ?Sized> {
    snatched: UnsafeCell<bool>,
    value: UnsafeCell<ManuallyDrop<T>>,
}

where the T is only dropped if snatched is false. get and get_mut exist for unsized types, as well as new downcast_ref and downcast_mut methods to turn &(mut) Snatchable<dyn Foo> to &(mut) Snatchable<T>. snatch and take still require a sized type. This will be instrumental in allowing unsized resources.

Note: I had to implement it this way because enums cannot currently contain unsized types. They do not unsize the same way that structs do.

I'm gonna start with step 1 (as most do), and see how far I get. I'll try to make this as small of a change as possible in terms of breaking things, especially outside of core.

Wumpf commented 3 months ago

sounds pretty good to me!

Core resource types will remain generic, but will allow unsized values

So that would be along the lines of @nical 's third suggestion in the original post? (Dynamically sized types)

TypeId may not be the best option. Maybe we want some other backend ID method? Id does already exist, and we might be able to use that, but I'm not sure.

given that each implementation is implemented with a value of the backend enum we could consider using that? 🤔 The nice sideeffect about this is that instead of doing a hash-lookup we can do just an array lookup ✨. To support extra backends we can make the enum non-exhaustive and numeric (: Agreed that TypeId isn't great and Id is used for resource ids only, so that would be a bit confusing, although an option.

lylythechosenone commented 3 months ago

non-exhaustive and numeric

The only issue with this is: who decides which number means what?

array lookup

What about unfilled slots? A Backend variant exists for all backends, regardless of what core is compiled for. Growing an array for, say, OpenGL, when all other backends are unused, would create 4 unnecessary instance and hub slots.

I think a hash-lookup there is actually fine, and potentially better. If we do choose to use TypeId and find that performance is a problem, we could skip hashing, as typeids are already (at least partially) hashed values. I don't think it will be too much of a problem, however.

One thing to consider is what type of hashmap we use. Because this map will be shared between threads, we need to synchronize it. We could use something like DashMap to do this automatically, or do it manually with a RwLock. Because instances will already need to be placed in Arcs, we can rely on similar arcanization optimizations as core already does. However, this may not be the case for hubs (to be decided). If we do choose to use Arcs and still find performance to be a problem, we could consider an atomic hashmap implementation, as we'll be using pointers anyway.

Wumpf commented 3 months ago

The only issue with this is: who decides which number means what?

well, we do. Right now there's an enum for all backends already. Minimal version just uses that

Growing an array for, say, OpenGL, when all other backends are unused, would create 4 unnecessary instance and hub slots.

we're literally talking about having a fixed sized array of less than 10 pointers that we need once in the hub, no? Sounds like I got this wrong

lylythechosenone commented 3 months ago

we do

I meant for out-of-tree backends.

we're literally talking about having a fixed sized array of less than 10 pointers

Yeah, you're actually right. Sometimes I go into over-optimization mode, that would probably be fine. The only way it isn't fine is if we store hubs inline rather than in an arc, but I think we want the arc for contention minimization anyway.

jimblandy commented 3 months ago

@nical I thought one of the goals here was to reduce compilation time during development. Couldn't that be accomplished just by enabling only one backend?

jimblandy commented 3 months ago

The "clutter of generic code" factor I don't find so compelling. It's not something that keeps me from getting work done.

I'm not sure this work should be a priority for us at the moment.

lylythechosenone commented 3 months ago

After some talk with @nical on matrix, I have a rough roadmap for what we need to do:

Separate all Type<A> types into Type<A> and TypeInfo<A>, where TypeInfo<A> stores everything except the raw field, and then Type<A> has TypeInfo<A> followed by the raw field. For example, Buffer<A>:

// Currently:
#[derive(Debug)]
pub struct Buffer<A: HalApi> {
    pub(crate) raw: Snatchable<A::Buffer>,
    pub(crate) device: Arc<Device<A>>,
    pub(crate) usage: wgt::BufferUsages,
    pub(crate) size: wgt::BufferAddress,
    pub(crate) initialization_status: RwLock<BufferInitTracker>,
    pub(crate) sync_mapped_writes: Mutex<Option<hal::MemoryRange>>,
    pub(crate) info: ResourceInfo<Buffer<A>>,
    pub(crate) map_state: Mutex<BufferMapState<A>>,
    pub(crate) bind_groups: Mutex<Vec<Weak<BindGroup<A>>>>,
}

// The goal:
#[derive(Debug)]
pub struct BufferInfo<A: HalApi> {
    pub(crate) device: Arc<Device<A>>,
    pub(crate) usage: wgt::BufferUsages,
    pub(crate) size: wgt::BufferAddress,
    pub(crate) initialization_status: RwLock<BufferInitTracker>,
    pub(crate) sync_mapped_writes: Mutex<Option<hal::MemoryRange>>,
    pub(crate) info: ResourceInfo<Buffer<A>>,
    pub(crate) map_state: Mutex<BufferMapState<A>>,
    pub(crate) bind_groups: Mutex<Vec<Weak<BindGroup<A>>>>,
}

#[derive(Debug)]
pub struct Buffer<A: HalApi> {
    pub(crate) info: BufferInfo<A>,
    pub(crate) raw: Snatchable<A::Buffer>,
}

Add an Info type to the Resource trait, and set it to this TypeInfo<A> type.

Create a Factory<T> trait like so:

pub trait Factory<T: Resource> {
    type Desc;
    fn create(&self, info: T::Info, desc: Self::Desc) -> Arc<T>;
    fn destroy(&self, resource: Arc<T>) -> T::Info;
}

Blanket implement Factory<T> for anything that can create a T, e.g. Factory<Buffer<A>> for T: hal::Device<A>, which will just call self.create_buffer, and insert it into a Buffer with the info.
Start using this trait instead of the hal traits when creating resources, e.g. replace self.raw.create_buffer with self.raw.create::<Buffer<_>>
Remove the : Sized bounds on HalApi's types, and replace them with Factory bounds

Then after all of that we'll regroup and discuss.

lylythechosenone commented 3 months ago

One important update: __Info will have to be __Metadata, because Info is already taken up by ResourceInfo.

Thinks to decide:

Where will the Factory trait go?
Should resources Deref into their Metadata types, or should I just refactor the code to add the extra field access?

nical commented 3 months ago

Adding a bit of context to @lylythechosenone's previous comments (context that can already be gathered from other places but is easy to miss):

The plan involves an additional hal backend, the "Dyn" backend which internally does the dynamic dispatch to the concrete (vulkan, d3d12, etc.) backend at runtime.
This backend's implementation of the HalApi trait uses DSTs for its associated types. For example A::Buffer where A is the dynamic backend would be dyn hal::Buffer.
The first steps of the work are to make the dyn hal backend possible at all, then add the dyn backend besides the existing ones and the last step is to make wgpu-core use the dyn backend exclusively (remove most of the generic parameters and refer to the dyn backend's types directly in wgpu-core).

lylythechosenone commented 3 months ago

Exactly. Although do note that resources will still have generics, they will just have a default type, and fns will only be implemented for this default type.

gfx-rs / wgpu

Make wgpu-core less generic (brainstorm) #5124

Factor out non-generic code into separate functions

Making the resources not generic

Downcasting

Boxed resources

Dynamically sized types

Can we do it without Any/downcast?

wgpu-core and wgpu-hal are very different in how hard they are to make object safe

wgpu-core has its own high level behavior and will most likely require its own traits anyway