Traverse-Research / gpu-allocator

🦀 GPU memory allocator for Vulkan, DirectX 12 and Metal. Written in pure Rust
https://traverse.nl/
Apache License 2.0
373 stars 49 forks source link

Add support for `presser` #138

Closed fu5ha closed 1 year ago

fu5ha commented 1 year ago

For CPU-to-GPU staging & update buffers, one must copy data into them. This currently requires unsafe raw pointer copies or using an unsound byte-slice interface. presser solves this by providing helper functions for safely copying data into raw allocated buffers while validating layout requirements and not creating invalid references to uninitialised memory.

Implementation

This PR adds support for presser in two ways, and for now only for Vulkan.

  1. Allocation implements presser::Slab directly, meaning you can directly pass it into presser's helper functions. However, because not all Allocations are valid Slabs, if you attempt to use a non-mapped or too-large allocation as a Slab, you'll cause a panic.
  2. For the case where you want to handle not being able to convert to a Slab in some way other than panicking, Allocation::try_as_mapped_slab will optionally return a MappedAllocationSlab which pre-checks its conditions and so infallibly implements Slab upon succeeding.

The advantage of implementing Slab on Allocation directly, in addition to the added level of convenience it provides, is that once EmbarkStudios/presser#5 lands and we upgrade to that version here, you'll be able to share an &Allocator and use the read_X helpers from presser from multiple places, rather than requiring &mut Allocator everywhere.

Future path to adoption

  1. Land this PR
  2. Design an equivalent API for the d3d12 Allocation type. I think this likely will look like adding a mapped_ptr: Option<NonNull<c_void>> field much like the vulkan::Allocation has, and then a pub fn map(&mut self, resource: *mut ID3D12Resource) which would fill in that field. But, that's for another PR's discussion...
  3. At this point we could probably do a new gpu-allocator release just with the copy apis.
  4. Land EmbarkStudios/presser#5 and upgrade gpu-allocator to a version 0.4 which includes those changes
  5. Do another gpu-allocator (minor) release which adds #[deprecated] attributes to mapped_slice[_mut], documentation detailing why they've been deprecated, and linking to an upgrade guide in some release notes or something (or just a document in the repo).
  6. Once we've established that presser is actually replacing all use cases, do a gpu-allocator major release which actually removes the #[deprecated] methods and links to the migration guide again in the changelog/release notes.
fu5ha commented 1 year ago

IIRC we have a d3d12::ID3D12Resource::Map() call in our custom implementation. I'm not sure why this isn't provided by gpu-allocator since this is an explicit (vkMapMemory()) operation on Vulkan as well... @manon-traverse any idea, perhaps worth to add in a separate PR?

Yeah so when looking through the d3d12 visualizer example, it seems like the main issue is that in order to get a mapped pointer you have to first create the actual ID3D12Resource ptr and then Map that, rather than mapping just the memory object that the Allocation knows about.. or something. So not sure exactly how an api there should look.

The Allocation could maybe contain an optional mapped ptr much like the vulkan Allocation does and then add a function like unsafe fn map(&mut self, res: *mut ID3D12Resource) which sets the internal mapped ptr and then allows access to the same as_mapped_slab api I introduced here.

MarijnS95 commented 1 year ago

Right, gpu-allocator would have to go through CreatePlacedResource() (and expose the returned ID3D12Resource for this) which seemingly isn't as clear-cut to use here: it is apparently allowed to be called multiple times on the same heap memory/offset to overlap resources.

fu5ha commented 1 year ago

Depending on how this pans out we might want to land https://github.com/Traverse-Research/gpu-allocator/compare/uninit at some point or remove the currently-UB API altogether.

Interesting. I'm sorta unconvinced that abstracting this is much of a benefit because it's so easy to make an initialized buffer uninit again by copying data that has padding bytes into it, as you mention. I'm also not sure exactly how that works in terms of whether data the gpu would write would make padding bytes uninit (I think it won't since it's FFI so compiler has to be conservative) for GPU-to-CPU. I think that case of a readback buffer is probably the most compelling use case for actually tracking the initialization state of the memory as a whole since that would allow actually-safe access as a &[u8] for reads.

And yeah, I think using presser to implement the initialization api would be a great fit, and I'm in favor of eventually removing the mapped_slice_mut api eventually :)

fu5ha commented 1 year ago

Right, gpu-allocator would have to go through CreatePlacedResource() (and expose the returned ID3D12Resource for this) which seemingly isn't as clear-cut to use here: it is apparently allowed to be called multiple times on the same heap memory/offset to overlap resources.

Not necessarily, it could require the user to do CreatePlacedResource() themselves and then pass the created resource back in to the map api on Allocation

MarijnS95 commented 1 year ago

Not necessarily, it could require the user to do CreatePlacedResource() themselves and then pass the created resource back in to the map api on Allocation

Meant in the theoretical case that we extend gpu-allocator to perform CreatePlacedResource() + Map(), instead of accepting it as an external argument (that would then have to be asserted "somehow" to belong to the allocation), sorry for not making that more clear :)

fu5ha commented 1 year ago

Ah right, makes sense :)

MarijnS95 commented 1 year ago

I'm sorta unconvinced that abstracting this is much of a benefit

Agreed. I'm inclined to just make fn mapped_slice unsafe (since it really, really is!), slap a /// # Safety\n/// Only to be called when the memory is known to be _fully_ initialized and leave out this whole MaybeUninit mess simply because It's Complicated™ and so easy to invalidate in the various ways you pointed out.

I think that case of a readback buffer is probably the most compelling use case for actually tracking the initialization state of the memory as a whole since that would allow actually-safe access as a &[u8] for reads.

This is especially tricky since yes, you typically want to consume the data in some &[T] way, and have to assume the GPU has written exactly that T to it and is synchronized with the CPU timeline (and isn't mutably aliased, isn't being written to by "another thing" on the GPU timeline, ...).

fu5ha commented 1 year ago

Agreed. I'm inclined to just make fn mapped_slice unsafe (since it really, really is!), slap a /// # Safety\n/// Only to be called when the memory is known to be _fully_ initialized and leave out this whole MaybeUninit mess simply because It's Complicated™ and so easy to invalidate in the various ways you pointed out.

snip moved to #139 discussion.

fu5ha commented 1 year ago

Sounds good to me @Jasper-Bekkers! I'll clean this up a bit more based on that plan and can start experimenting with a dx12 api soon.

fu5ha commented 1 year ago

Addressed Jasper's feedback and updated the original PR description with the new direction :)

fu5ha commented 1 year ago

... github is being dumb and not letting me re-request from both of you at the same time I guess lol.

MarijnS95 commented 1 year ago

NonNull::slice_from_raw_parts() just stabilized which we might also use to represent a valid range of bytes on a raw pointer, without the UB of mapping possibly uninitialized data as safe slice.