Closed nical closed 2 months ago
Note that we call also break it down into plenty of traits like TextureResource, BufferResource, etc. That's a rather small detail once we have overcome the other obstacles.
I think it might be a bit understated, but breaking things up (not necessarily into traits) by making some source type visible could make downcasts less error (or panic) prone.
By for example, limiting legal downcasting through a trait:
enum Vulkan {}
Impl ApiMarker for Vulkan {}
trait ApiMarker;
trait DowncastRef<A: ApiMarker, T> {
fn downcast_ref(&T) -> &Self;
}
impl DowncastRef<Vulkan, TextureView> for VulkanTextureView {
fn downcast_ref(resource: &TextureView) -> &Self {
// get the resource
}
}
DowncastRef
could even be an unsafe trait and do the same kind of pointer cast we do in wgpu today between contexts. The implementer of the trait would have to ensure it's correct, not every use of the downcast.
Note that this could all be bundled into another Resource
-like trait that we could insist all backend-specific types implement. I think there are more things we might need anyway. Like the ability to extract a global identifier unless we want to deprecate that API.
Exactly how it would have to fit together is tricky.
I think it might be a bit understated, but breaking things up (not necessarily into traits) by making some source type visible could make downcasts less error (or panic) prone.
I agree, I meant that it's an understood part of the design space, while right now I'm not sure how feasible it is to erase all of the hal types without doubling the amount of allocations and indirection.
FYI, I'm investigating whether this technique can be used (or be made) sound in Rust, this would mean that smart pointers could be stored internally as their hal types in wgpu-core, but when shared with types in wgpu (or elsewhere) they could be accessed through trait objects.
This could potentially reduce the number of necessary downcasts, and the representation of a concrete type is more efficient and can access private methods not exported through a public trait. In effect, something like Arc<dyn Surface>
and Arc<A::Surface>
has the same representation, so coercion between one and the other might be possible.
One of the questions to consider is at which layer of wgpu's architecture should we have the dyn traits.
wgpu-hal
's `HalApi. wgpu-core
implements dynamic dispatch on top of hal's static-only dispatch.I have a preference towards A because it removes the need for an extra interface layer. In a potential future where webrender uses wgp-hal
directly, we don't need another solution to dynamic dispatch. It could in the long run lead to a mostly generic-free wgpu-core. If we push this far enough, we can also implement out-of-tree backends via that interface without requiring them to go through yet-another abstraction. The tricky part about this one is how do we create the hal resources (without necessarily boxing them).
B is the "let's not touch wgpu-hal" approach but I don't really see an advantage to it unless we don't find a solution for how to create the resource types in A.
I see C mostly as a step towards A or B.
Making wgpu-hal
dynamic is a lot less obvious to me than doing so for wgpu-core
. I believe the latter is the better gradual approach and once identifiers are no more we can re-evaluate. Furthermore wgpu-core
will currently need to export an API which is fairly specialized to it, so regardless I don't think there's any getting away from it exporting its own traits for behavior without a significant rework of its API.
I think what makes this less obvious stems from the fact that wgpu-hal
was never really designed to be dispatched over dynamically. wgpu-core
has the gfx_select!
mechanism which restricts how much generics could be used in practice, So almost everything you find in wgpu-hal
in contrast is associated types and generic parameters that would have to be dissolved.
You are arguing at a fairly abstract and conceptual level. I had an equally abstract counter-argument but all that it shows is different view points and philosophies. Instead, let's come up with concrete (skeletons of) solutions, compare them and once we have defined our goal, we can break it down into its incremental steps.
Feel free to shoot the abstract counter argument, but since you felt mine were too abstract, I'll at least try to rephrase them here more concretely:
This boils down to is that a method in wgpu-core
tends to look like this:
pub fn texture_create_view<A: HalApi>(
&self,
texture_id: id::TextureId,
desc: &resource::TextureViewDescriptor,
id_in: Option<id::TextureViewId>,
) -> (id::TextureViewId, Option<resource::CreateTextureViewError>);
And the corresponding wgpu-hal
methods tend to look like this:
unsafe fn create_texture_view(
&self,
texture: &A::Texture,
desc: &TextureViewDescriptor,
) -> Result<A::TextureView, DeviceError>;
The associated types here are A::Texture
and A::TextureView
, and can't be present during dynamic dispatch, so for your alternative A (as I've understood it) they'd have to be replaced with dynamic objects like Arc<dyn Texture>
and Arc<dyn TextureView>
- this is the type change that I think would require essentially a complete rewrite of wgpu-hal
, which affects every downstream use of it.
In comparison, wgpu-core
methods are already pretty much object-safe if we monomorphise them over A: HalApi
(like I've proposed here).
The second aspect is that a wgpu-core Texture
and a wgpu-hal Texture does very different things, you can see this reflected in the wrapper type which is defined in wgpu-core:
pub struct Texture<A: HalApi> {
pub(crate) inner: Snatchable<TextureInner<A>>,
pub(crate) device: Arc<Device<A>>,
pub(crate) desc: wgt::TextureDescriptor<(), Vec<wgt::TextureFormat>>,
pub(crate) hal_usage: hal::TextureUses,
pub(crate) format_features: wgt::TextureFormatFeatures,
pub(crate) initialization_status: RwLock<TextureInitTracker>,
pub(crate) full_range: TextureSelector,
pub(crate) info: ResourceInfo<Texture<A>>,
pub(crate) clear_mode: RwLock<TextureClearMode<A>>,
pub(crate) views: Mutex<Vec<Weak<TextureView<A>>>>,
pub(crate) bind_groups: Mutex<Vec<Weak<BindGroup<A>>>>,
}
So at least initially, I'd expect that we have to wrap wgpu-hal
types to provide the same level of functionality, which means that wgpu-core
would have to define a dynamic dyn Texture
type anyway. To boil it down, wgpu-hal
tends to be responsible for low level behavior like holding onto raw handles and the resources necessary to perform API calls and wgpu-core
the high level stuff like if the texture has been consumed or which bind groups its used in.
Thanks.
The associated types here are A::Texture and A::TextureView, and can't be present during dynamic dispatch, so for your alternative A (as I've understood it) they'd have to be replaced with dynamic objects like Arc
and Arc
They would be replaced with references like &dyn Texture
instead of arcs but yes that's the general idea. For everything except the create_*
functions, this change pretty simple and mechanical. I put together a quick and dirty experiment in a few minutes.
So it goes back to the problem of how to create the resources. When hal
creates a buffer, the conventional ways to return it are either as a concrete type (which requires the function to be generic), or via a box or some other type of pointer to a trait object (in which case wgpu-core loses the ability to easily place the hal resource in the same allocation as the wgpu-core resource).
Taking the example of wgpu-core's Texture
with the direction that I am exploring at the moment would look something like this:
// most of wgpu-core only deals with `Arc<Texture>` for which the backend is erased.
pub type Texture = TextureResource<dyn SnatchableResource>
pub struct TextureResource<A: SnatchableResource> {
pub(crate) raw: A,
// Members below, are unchanged.
pub(crate) device: Arc<Device<A>>,
pub(crate) desc: wgt::TextureDescriptor<(), Vec<wgt::TextureFormat>>,
pub(crate) hal_usage: hal::TextureUses,
pub(crate) format_features: wgt::TextureFormatFeatures,
pub(crate) initialization_status: RwLock<TextureInitTracker>,
pub(crate) full_range: TextureSelector,
pub(crate) info: ResourceInfo<Texture<A>>,
pub(crate) clear_mode: RwLock<TextureClearMode<A>>,
pub(crate) views: Mutex<Vec<Weak<TextureView<A>>>>,
pub(crate) bind_groups: Mutex<Vec<Weak<BindGroup<A>>>>,
}
The Snippet from my earlier comment shows how Snatchable Resource would work.
this is the type change that I think would require essentially a complete rewrite of wgpu-hal
Lets elaborate on this, since I certainly don't want to see a rewrite of wgpu-hal. The changes I have in mind are pretty superficial and mechanical (See the quick and dirty experiment I just linked).
The key missing piece is the creation of the resources.
Box<dyn TextureResource>
.unsafe fn create_texture_view(
&self,
texture: &dyn TextureResource
desc: &TextureViewDescriptor,
) -> Result<Box<dyn TextureViewResource>, DeviceError>;
Problem solved. It's a rather simple refactoring to do, we can easily do it incrementally. In my TextureView
definition above you can remove the generic parameter and replace raw
with: raw: Box<dyn hal::TextureResource>,
.
That's at the cost of an extra allocation and pointer chase for each resource. I'm not very satisfied but it's not completely unreasonable.
create_*
functionsunsafe fn create_texture_view(
&self,
texture: &dyn TextureResource
desc: &TextureViewDescriptor,
) -> Result<A::TextureView, DeviceError>;
Then the generics propagate into wgpu-core. How can we contain it? Is it a separate dispatch mechanism specifically for the create functions which encapsulates a bit of wgpu-core side generic code to create the concrete TextureResource<A::Texture>
and coerce it into a TextureResource<dyn hal::TextureResource>
?
unsafe fn create_texture_view(
&self,
texture: &dyn TextureResource
desc: &TextureViewDescriptor,
output: SomethingRepresentingTheMemoryToPlaceTheTextureInto
) -> Result<(), DeviceError>;
Lets elaborate on this, since I certainly don't want to see a rewrite of wgpu-hal.
FWIW, changing the signature of every function and every use of A::*
associated types both in wgpu-core and wgpu-hal and everything that depends on wgpu-hal is what I was referring to as a rewrite. There are no particular negative connotations associated with it, here it just means that "almost everything regarding the API will change".
Ok, then I think that the churn is acceptable. It mostly touches function the signatures at the API boundary, it's easy to do and can be done piecemeal. Removing generics from large parts of wgpu-core is going to be a lot more invasive. Note that these wgpu-hal changes (putting aside the create_*
functions), don't even require changes to wgpu-core.
I noticed that you're not removing the Buffer
associated type nor the related use of A: HalApi
in wgpu-core
, do you intend to keep these?
I just want to generally express my interest in the wgpu-hal way, assuming performance is acceptable. I didn't realize that was even possible, let alone low lift enough to not be a major problem.
I noticed that you're not removing the
Buffer
associated type nor the related use ofA: HalApi
inwgpu-core
, do you intend to keep these?
Ideally these would go away, it depends on how the create_
functions are handled in the final solution. That's the big question mark.
I'm going to keep throwing ideas without trying to filter out the bad ones. Here's my latest thought:
If we care about the wgpu-core
and wgpu-hal
types sitting in the next allocation, we could pass a custom allocator to the wgpu-hal create functions. wgpu-core could allocate a generous amount of space for the hal type in the same allocation as the core type, and sub-allocate the hal type from there. wgpu-hal
Functions like Device::create_texture
could return a Result<Box<dyn TextureResource, InPlaceAllocator>, Error>
.
Custom Allocators are not stable in rust but we could duplicate the implementation of Box
or add a dependency on the allocator-api2
crate.
What's nice about this is that if it turns out that the extra allocation does not matter, rolling back to just returning a normal Box
using the global allocator is trivial. We haven't cornered ourselves with a solution that is difficult to remove.
What's nice about this is that if it turns out that the extra allocation does not matter, rolling back to just returning a normal Box using the global allocator is trivial. We haven't cornered ourselves with a solution that is difficult to remove.
I think this is better thought of the other way around. If the simple solution (just box the thing) turns out to be bad for performance, we can figure out a better way without breaking too much.
Ideally these would go away
Isn't it necessary for them to go away to achieve the stated goals? Otherwise I don't see how we're reducing code generation which is what would contribute to reduced binary sizes and reduced compile times. Most of the duplicate monomorphization would be in the wgpu-core to wgpu-hal glue where each type and method that has a A: HalApi
parameter is instantiated for every backend.
To clarify, here's a summary of what produces the most LLVM right now on my system configuration when I build wgpu in release mode.
We can see that the more heavily dominating sources are wgpu-core, where there's 3 copies of most methods taking <A>
because I have three backends enabled (vulkan
, dx12
, and gles
). At ~40 we see the first contribution from wgpu-hal, but that isn't a method that would be impacted by the proposed change, it's transition_textures
which gets a lot of copies because it takes a generic iterator (we could and should definitely fix that).
Isn't it necessary for them to go away to achieve the stated goals?
Kind of yes, but the object-safe and non-object-safe parts of hal's API could be split into two traits. Here is an example which is a direct continuation of the gist from my first comment:
In wgpu-hal the device API is split into two traits (I gave them terrible names so don't think about them too much):
/// Object safe trait.
pub trait DeviceApi {
fn buffer_map(&self, buffer: &dyn Resource) -> Result<(), Error>;
// etc.
}
/// Parts of the API that we couldn't make object-safe (mostly the create_* functions) are split into this "Factory" trait.
pub trait HalDeviceFactory {
type Texture: Resource;
fn create_hal_texture(&self, desc: &TextureDescriptor) -> Result<Self::Texture, Error>;
// etc.
}
In wgpu-core, an object-safe trait creates the type-erased objects we want. The implementation knows about the concrete hal types so it can place them in the core resource struct, but it doesn't have to do anything else so the generics are contained into a small part of the code:
/// The object-safe equivalent to `HalDeviceFactory`
pub trait CoreDeviceFactory {
fn create_texture(&self, desc: &hal::TextureDescriptor) -> Result<Arc<Texture>, Error>;
}
// A blanket impl is enough to do what we need:
impl<T> CoreDeviceFactory for T where T: hal::HalDeviceFactory {
fn create_texture(&self, desc: &hal::TextureDescriptor, label: String, etc: Stuff) -> Result<Arc<Texture>, Error> {
let typed_texture = Arc::new(TextureResource {
label,
raw: Snatchable::new(self.create_hal_texture(desc)?),
});
Ok(typed_texture)
}
}
Updated gist: https://gist.github.com/nical/c5d88aaf97f20815756a36dc5c94b5a3/999fb470cc2037ad6340f4bbc1a2fe666ab1e1e0
In wgpu-core, an object-safe trait creates the type-erased objects we want. The implementation knows about the concrete hal types so it can place them in the core resource struct, but it doesn't have to do anything else so the generics are contained into a small part of the code.
Ok, this is pretty much what I want to do. To me that is essentially what doing "wgpu-core first" means, because it doesn't actually require changing anything in wgpu-hal. Once this is done, we'd have capitalized on most of the available gains, and can revisit what actually needs to be fixed in wgpu-hal to further reduce duplicate code generation.
To clarify, here's a summary of what produces the most LLVM right now on my system configuration when I build wgpu in release mode.
Yes, that's a big part of my motivation to put the dynamic dispatch closer to hal. In an ideal world wgpu-core has (almost) no code generic on the backend and close to none of its generated code is duplicated due the the existence of multiple backends.
I'll leave you to fiddle a bit more, once you have a complete example I can look at it further if you want me to. The tool I've used to test changes so far is cargo llvm-lines.
Here is a rough plan:
Device
trait in wgpu-hal
. Let's call it DynDevice
.
Device
trait via a blanket impl. create_
and destroy
functions for now or it could provide ones that Box the resources for completeness but aren't going to be used by wgpu-core
.wgpu-core
, introduce an object-safe trait responsible for creating a destroying the resources, let's call it DeviceFactory
. It is also implemented automatically via a blanket impl.
wgpu-core
resources are implemented following the pattern described in the gist where a resource is first instanciated as Arc<FooResource<HalDevice::Foo>>
with the associated type then cast into a DSTwith the backend type erased in the form Arc<FooResource<dyn hal::Foo>>
.wgpu-core
's usage of resources to the type erased ones.wgpu-core
's code shouldn't have a generic backend parameter.gfx_select
should has become obsolete.run_render_pass_impl
generic again and wrap it into an object-safe interface in the DynDevice
trait.The plan did not survive first contact with the text editor. The problem is that because the existing (non-object-safe) traits in wgpu-hal are generic (for example: Device<A: Api>
), a blanket impl cannot be made from them.
// Error: Unconstrained parameter A.
impl <A, T: Device<A>> DeviceFactory for T {
// ...
}
That said, if we wrap the Device<A>
trait in a struct, we should be able to implement the object-safe trait for the wrapper so not all is lost.
Actually, I'm not sure a wrapper would help at all. Another approach is to take the generic parameter out of the hal traits and turn them into associated types:
pub trait Device: WasmNotSendSync {
type: A: Api;
// If only the following syntax worked, unfortunately we get an ambiguous associated type error.
// unsafe fn create_buffer(&self, desc: &BufferDescriptor) -> Result<Self::A::Buffer, Error>;
// This works but is an ergonomics nightmare.
unsafe fn create_buffer(&self, desc: &BufferDescriptor) -> Result<<Self::A as Api>::Buffer, Error>;
}
impl<T: Device> DeviceFactory for T {
unsafe fn dyn_create_buffer(&self, desc: &BufferDescriptor) -> Result<Box<dyn BufferResource>, Error> {
}
}
That's a huge bummer, because even if the end state is that the <Self::A as Api>::Buffer
ends up contained in just a few places (the hal trait definitions and their wrappers), getting there incrementally means core has to be filled with this generic gymnastics before it can be cleaned up.
I think that if this is the wall that's holding us back, we should just go ahead. A bit of temporary ugly code is a fair trade for our end goal.
@nical do you have a branch where you were working on this? I understand you've been a bit busy, so I thought I could take a crack at it.
@lylythechosenone the little I have on branches is probably easier to re-write than rebase. I had started experimenting on this one and this one but these are more exploratory than actual implementations.
I'm glad that you are motivated to pick this up, since I can't dedicate time to this in the foreseeable future. I thought I had some notes about this but unfortunately I'm either misremembering or I lost them so I'll try to page the plan back in my head:
It is very important to do this incrementally. It's going to be a difficult change to land, it will touch most of wgpu-core and will conflict with most PRs happening there so landing changes often and in small chunks is key. I encourage you to experiment on a throwaway branch and re-implement cleaner/isolated commits once you have identified what sticks.
Probably a good first thing to try is what I suggested in my previous comment about moving the generic parameters to associated types. I was worried that it would make things messy but Connor later told me it'd probably be fine. If that doesn't work out we can implement the object-safe version for each backend instead of deriving it from the non-object safe ones.
There will probably be parts of the API that will need to be split off into traits implemented in wgpu-core, for example creating the resources needs to know about the concrete type (for example wpgu_core::BindGroupResource<hal::BindGroup>>
).
Don't hesitate to reach out with questions or to bounce ideas, and thanks again for looking into this.
I am happy to report that converting all Foo<A>
traits to Foo<A = ...>
traits was successful. Now I just need to refactor all of the backends, and I can start on an actual dynamicification.
Alright, I have done a bit of work on the dynamic traits. They are mostly implemented, however a few functions are unimplemented due to lack of dynamic forms for:
BuildAccelerationStructureDescriptor
ComputePassDescriptor
BufferBinding
RenderPassDescriptor
TextureBarrier
BufferBarrier
GetAccelerationStructureBuildSizesDescriptor
Currently, these structs all follow a very similar pattern:
struct Foo<'a, A: Api> {
thing: &'a A::Thing,
// some metadata
bar: Bar,
baz: Baz,
}
We could make dynamic forms of them in one of two ways. Firstly, we could duplicate them and create specific dynamic forms, e.g.:
struct Foo<'a> {
thing: &'a dyn Thing,
// some metadata
bar: Bar,
baz: Baz,
}
Alternatively, we could adapt the existing ones like so:
struct Foo<'a, T: Thing + ?Sized> {
thing: &'a T,
// some metadata
bar: Bar,
baz: Baz,
}
The first allows us to not make any more breaking changes. However, the second could be more intuitive. Please tell me your thoughts, and let me know if you think of a third option.
It's really tempting to go with thing: &'a dyn Thing
right away, but I'm wary of too much duplication going in even if it's supposed to be temporary - there's always the risk that through some unforeseen circumstances things stay in their temporary state and then we end up with an overall confusing construct. So I would prefer the thing: &'a T,
interim variant even though it may mean more code churn overall simply because if the effort gets stuck this is imho a way less bad state to get stuck on (please please don't get stuck either way 😄 )
@Wumpf Actually, I had intended for this to be one of the permanent changes. This is still on the wgpu-hal
side of things. If you have another solution, please do tell.
we talked it out a bit on Matrix chat here. I misunderstood proposal 2 (no dyn) to be transitional since I assumed we want to end up at proposal 1 (&dyn
) eventually.. which isn't that clear!
We eventually landed that proposal 2 (no dyns) is where we want to be since it's easier to use & less invasive (less casting), also likely not much extra codegen since it's only used by wgpu-core.
Alright, it is now mostly done, but there is one pretty big problem. Unlike my first PR, this one will break users and core. I'll have to do the fixes in core before I submit the PR.
Here's a big question we need to answer: There are a few types that are not needed by the dynamic traits, but follow this same pattern. Should I transition these as well, or keep them taking in Api
?
EDIT: they'll be needed by dynamic traits in wgpu-core, so I'll go ahead and transition them.
Per more Matrix chat: Turns out the conversion from some untyped to typed structures can be fairly heavy: E.g. BuildAccelerationStructureDescriptor
needs to recreate all its vecs, casting every single objects. Note that we can not simply do a saftey-check + transmute since &T
and &dyn SomeTrait
are completely different pointers.
Overall this doesn't seem to be a huge blocker but it might be that we need to think about ways to optimize this at some later point.
This is also true of the &[&CommandBuffer]
and related things on Queue
A SmallVec
could probably work on &[&CommandBuffer]
and &[&SurfaceTexture]
, as they're unlikely to have more than a few.
Acceleration structure-related downcasts cannot be done currently. They take a reference to their entries. I may have to do some unsafe
here and store the typeid outside of the types. For now, I am leaving them todo!
, because they don't exist in core anyway.
Alright, that's everything that needs to be done in wgpu-hal. I'm going to need a lot of help with core, because I've never touched the codebase before. I'm also not entirely sure what the structure should be. I know we need some "factory" traits for creation of objects, and these should be dynamic. They should create core resources. However, that's about where my knowledge ends. @nical it'd be helpful if you could fill me in on whatever more you know, if you do.
I did some looking around, and I think I get the basic idea now. I'd still love a more thorough explanation.
What I'm currently thinking (in order of implementation steps):
Api
, and I think only the generic itself can unsize, but I'll have to check on that I guess.Hubs
will transition to storing a HashMap
with the TypeId
of the Api
implementor (give other ideas if you have any), and each Hub
will store dynamic-dispatch resources.
TypeId
may not be the best option. Maybe we want some other backend ID method? Id
does already exist, and we might be able to use that, but I'm not sure.Instance
will have a similar transitionBackend
enum can remain, with an Other
option. Maybe this could replace Empty
. When an instance is created, it will use these default backends, and they will then be unsized into the Hubs
. Once an Instance
exists, the concrete type is no longer needed. Dynamic dispatch will handle the rest.This will also require wrapper traits for all of the hal traits, although I'm not sure where that should fit in. These wrappers will build off of the dynamic versions, but re-implement create_
and destroy_
methods in terms of resources.
We will also need a ?Sized
Snatchable
, which I have already implemented using a tad bit of unsafe
, similar to the following:
struct Snatchable<T: ?Sized> {
snatched: UnsafeCell<bool>,
value: UnsafeCell<ManuallyDrop<T>>,
}
where the T
is only dropped if snatched
is false. get
and get_mut
exist for unsized types, as well as new downcast_ref
and downcast_mut
methods to turn &(mut) Snatchable<dyn Foo>
to &(mut) Snatchable<T>
. snatch
and take
still require a sized type. This will be instrumental in allowing unsized resources.
Note: I had to implement it this way because enums cannot currently contain unsized types. They do not unsize the same way that structs do.
I'm gonna start with step 1 (as most do), and see how far I get. I'll try to make this as small of a change as possible in terms of breaking things, especially outside of core.
sounds pretty good to me!
Core resource types will remain generic, but will allow unsized values
So that would be along the lines of @nical 's third suggestion in the original post? (Dynamically sized types
)
TypeId may not be the best option. Maybe we want some other backend ID method? Id does already exist, and we might be able to use that, but I'm not sure.
given that each implementation is implemented with a value of the backend enum we could consider using that? 🤔
The nice sideeffect about this is that instead of doing a hash-lookup we can do just an array lookup ✨. To support extra backends we can make the enum non-exhaustive and numeric (:
Agreed that TypeId
isn't great and Id
is used for resource ids only, so that would be a bit confusing, although an option.
non-exhaustive and numeric
The only issue with this is: who decides which number means what?
array lookup
What about unfilled slots? A Backend
variant exists for all backends, regardless of what core is compiled for. Growing an array for, say, OpenGL, when all other backends are unused, would create 4 unnecessary instance and hub slots.
I think a hash-lookup there is actually fine, and potentially better. If we do choose to use TypeId and find that performance is a problem, we could skip hashing, as typeids are already (at least partially) hashed values. I don't think it will be too much of a problem, however.
One thing to consider is what type of hashmap we use. Because this map will be shared between threads, we need to synchronize it. We could use something like DashMap
to do this automatically, or do it manually with a RwLock
. Because instances will already need to be placed in Arc
s, we can rely on similar arcanization optimizations as core already does. However, this may not be the case for hubs (to be decided). If we do choose to use Arc
s and still find performance to be a problem, we could consider an atomic hashmap implementation, as we'll be using pointers anyway.
The only issue with this is: who decides which number means what?
well, we do. Right now there's an enum for all backends already. Minimal version just uses that
Growing an array for, say, OpenGL, when all other backends are unused, would create 4 unnecessary instance and hub slots.
we're literally talking about having a fixed sized array of less than 10 pointers that we need once in the hub, no? Sounds like I got this wrong
we do
I meant for out-of-tree backends.
we're literally talking about having a fixed sized array of less than 10 pointers
Yeah, you're actually right. Sometimes I go into over-optimization mode, that would probably be fine. The only way it isn't fine is if we store hubs inline rather than in an arc, but I think we want the arc for contention minimization anyway.
@nical I thought one of the goals here was to reduce compilation time during development. Couldn't that be accomplished just by enabling only one backend?
The "clutter of generic code" factor I don't find so compelling. It's not something that keeps me from getting work done.
I'm not sure this work should be a priority for us at the moment.
After some talk with @nical on matrix, I have a rough roadmap for what we need to do:
Separate all Type<A>
types into Type<A>
and TypeInfo<A>
, where TypeInfo<A>
stores everything except the raw
field, and then Type<A>
has TypeInfo<A>
followed by the raw
field. For example, Buffer<A>
:
// Currently:
#[derive(Debug)]
pub struct Buffer<A: HalApi> {
pub(crate) raw: Snatchable<A::Buffer>,
pub(crate) device: Arc<Device<A>>,
pub(crate) usage: wgt::BufferUsages,
pub(crate) size: wgt::BufferAddress,
pub(crate) initialization_status: RwLock<BufferInitTracker>,
pub(crate) sync_mapped_writes: Mutex<Option<hal::MemoryRange>>,
pub(crate) info: ResourceInfo<Buffer<A>>,
pub(crate) map_state: Mutex<BufferMapState<A>>,
pub(crate) bind_groups: Mutex<Vec<Weak<BindGroup<A>>>>,
}
// The goal:
#[derive(Debug)]
pub struct BufferInfo<A: HalApi> {
pub(crate) device: Arc<Device<A>>,
pub(crate) usage: wgt::BufferUsages,
pub(crate) size: wgt::BufferAddress,
pub(crate) initialization_status: RwLock<BufferInitTracker>,
pub(crate) sync_mapped_writes: Mutex<Option<hal::MemoryRange>>,
pub(crate) info: ResourceInfo<Buffer<A>>,
pub(crate) map_state: Mutex<BufferMapState<A>>,
pub(crate) bind_groups: Mutex<Vec<Weak<BindGroup<A>>>>,
}
#[derive(Debug)]
pub struct Buffer<A: HalApi> {
pub(crate) info: BufferInfo<A>,
pub(crate) raw: Snatchable<A::Buffer>,
}
Info
type to the Resource
trait, and set it to this TypeInfo<A>
type.Factory<T>
trait like so:
pub trait Factory<T: Resource> {
type Desc;
fn create(&self, info: T::Info, desc: Self::Desc) -> Arc<T>;
fn destroy(&self, resource: Arc<T>) -> T::Info;
}
Factory<T>
for anything that can create a T
, e.g. Factory<Buffer<A>>
for T: hal::Device<A>
, which will just call self.create_buffer
, and insert it into a Buffer
with the info.self.raw.create_buffer
with self.raw.create::<Buffer<_>>
: Sized
bounds on HalApi
's types, and replace them with Factory
boundsThen after all of that we'll regroup and discuss.
One important update: __Info
will have to be __Metadata
, because Info
is already taken up by ResourceInfo
.
Thinks to decide:
Factory
trait go?Deref
into their Metadata
types, or should I just refactor the code to add the extra field access?Adding a bit of context to @lylythechosenone's previous comments (context that can already be gathered from other places but is easy to miss):
A::Buffer
where A is the dynamic backend would be dyn hal::Buffer
.Exactly. Although do note that resources will still have generics, they will just have a default type, and fns will only be implemented for this default type.
I'll use this issue to explore various ways we could make wgpu-core less generic.
The motivations include reducing the clutter of overly generic code and more importantly to reduce binary size, compile times.
Factor out non-generic code into separate functions
There is some amount of code that could be easily factored out, for example some of the validation. Isolating the validation also has the advantage that it can be laid out to match the specification so that we can more easily check that the validation is correct. There is only a limited amount of bang for buck we can get
Making the resources not generic
We'll get the biggest wins by far if we can erase the type of the resources. most of wgpu-core is generic only because something needs to hold on to a generic resource.
Downcasting
Assuming we can accept the cost of downcasting every resource at the top of every hal functions, we could use trait objects and downcast via
Any
.The gpu-hal APIs would take
&dyn Resource
parameters instead of some associated types, and the first thing they would do is downcast these dyn traits into the concrete types they are expecting.Note that we call also break it down into plenty of traits like
TextureResource
,BufferResource
, etc. That's a rather small detail once we have overcome the other obstacles.Boxed resources
The simple approach (which I find very unsatisfying) is to box all hal resources:
The wgpu-core and wgpu-hal structures are no longer next to each other in memory, that most likely has a cost but it is hard to estimate it without doing all of the work.
Dynamically sized types
The last member of a struct can be a DST, which turns the structure into a dst as well:
It's quite nice, however there are a few constraints:
Option<dyn hal::Resource>
device_create_*
functions would probably need to still be generic.Code of the DST experiment: https://gist.github.com/nical/c5d88aaf97f20815756a36dc5c94b5a3 it also shows how snatchable resources would work.
Can we do it without Any/downcast?
Maybe, with a copious amount of unsafe code which might be fine. Suggestions are welcome!