gboisse / gfx

A minimalist and easy to use graphics API.
MIT License
502 stars 36 forks source link

gfxBufferSetData #40

Closed sayan1an closed 1 year ago

sayan1an commented 2 years ago

Hi, I would like to update a small buffer from CPU (every frame) and then use it in a compute shader.

I am not sure how to use buffers with kGfxCpuAccess_Write enabled.

gboisse commented 2 years ago

Hey, you can see an example of writing to some buffer with kGfxCpuAccess_Write over here:

https://github.com/gboisse/gfx/blob/68b78ae6def0267a85f794bfb468e388789d8e5f/examples/common/gpu_scene.cpp#L226-L246

However, this writes some transforms inside the mapped buffer but then schedules a copy to a buffer with no CPU access (which is what will be consumed by the GPU).

From your description, it seems you are more interested in a constant buffer mechanism. There are many ways to go about this, but I thought I'd send you an updated version of the 00-hellotriangle.zip example that's using a cbuffer binding as opposed to the root constant that is used in the master branch.

Hope that'll help you figure out how to implement a constant buffer pool efficiently 🙂

sayan1an commented 2 years ago

I think using a cbuffer should work, however, I am actually looking for the mapped-buffer technique. Thanks!

So, modifying the data through mapped ptr glm::mat4 *transforms automatically updates the upload_transform_buffers? Then you copy the upload-buffer to gpu-access-only buffer?

gboisse commented 2 years ago

There is no automatic update.

When you create a buffer with kGfxCpuAccess_Write, the buffer is actually allocated in main memory (CPU RAM if you will). Then the GPU is able to read directly from that memory through the PCIe bus using some driver "magic". So the reads from the GPU are possible (the copy isn't technically needed) but slow (i.e., limited to your PCIe bandwidth) and typically uncached.

It's fine for constants (since it's typically small amounts of memory and only read a few times), but for stuff that you may access many times (like transforms in this example), best to copy to a VRAM buffer (i.e., created with kGfxCpuAccess_None) to benefit from the full bandwidth of your GPU memory as well as its caching hierarchy (i.e., L1 and L2).

Finally, although it is possible for your shaders to read directly from a kGfxCpuAccess_Write buffer, care must be taken that the CPU doesn't write to it while the GPU reads from it (the same issue applies to the copy-to-gpu-memory case btw). This is what the constant buffer pool system ensures in the zipped example I sent you.

gboisse commented 2 years ago

Here's an excellent blog post on the topic in case you'd want to dive deeper: https://therealmjp.github.io/posts/gpu-memory-pool/

sayan1an commented 2 years ago

Hi thank you for sharing the deep dive.

I am still a bit confused about the part "care must be taken that the CPU doesn't write to it while the GPU reads from it". Can you please clarify how the is issue being taken care of under the following circumstances:

  1. gfxProgramSetParameter(gfx, rtao_program, "ViewProjectionMatrix", view_projection_matrix); - In this case, if the matrix is updated in the render-loop, is it safe to set parameters like this?

Looking forward to your answer.

Regards, Sayantan

gboisse commented 2 years ago

hey there, so you need to recall that when you "draw" things from the CPU, all you're really doing is recording some commands for the GPU to consume at some later point in the future.

As a matter of fact, in the case of gfx, the GPU can be up to kGfxConstant_BackBufferCount frames behind the CPU; by default anyway, you can change this by tweaking max_frames_in_flight when creating a context: https://github.com/gboisse/gfx/blob/d94cb42a5adab4ac2cee5442db0561da3e18c7f1/gfx.h#L389

This helps ensures maximum throughput and high framerates by minimizing the "sync. points" (i.e., where CPU and GPU have to wait for each other), but this comes at the cost of a higher latency (what is displayed on screen is possibly up to 3 frames behind the current state of CPU processing).

Since a picture is worth a thousand words, here's a typical frame breakdown, where the horizontal axis would be time: frame_sync

You can clearly see here that the GPU really starts processing the "blue frame" while the CPU is busy encoding the next "green frame".

Now imagine that the CPU would decide to reuse the same "Constants" memory block when writing to memory on the green block; here you'd have a race condition: the GPU may be reading from the memory location while the CPU is writing to it. And even if this doesn't end up happening, there is the risk that the GPU would "see" the constants of the green frame as it is really only processing the blue frame, hence corrupting your render state.

That's why the usual solution is to multi-buffer those constant buffer pools for as many in-flight frames is allowed by the system.

For your question regarding gfxProgramSetParameter(gfx, rtao_program, "ViewProjectionMatrix", view_projection_matrix);, this uses another mechanism called root constants where the constant data is written out directly to the command stream.

This is still multi-buffered of course, but at the command stream level and managed internally by gfx by having as many ID3D12CommandAllocator as the maximum number of allowed in-flight frames: https://github.com/gboisse/gfx/blob/d94cb42a5adab4ac2cee5442db0561da3e18c7f1/gfx.h#L1262-L1270

gboisse commented 2 years ago

A correction on the previous message; you can only tweak the max_frames_in_flight count if creating an interop. context (i.e., when interacting with an application that already has its own handling of D3D12).

So, most likely irrelevant for you 🙂