eliemichel / LearnWebGPU

Learn to use WebGPU for native graphic applications in C++
https://eliemichel.github.io/LearnWebGPU
402 stars 65 forks source link

Buffer overflow/alignment issues in step 34 #71

Open petar-andrejic opened 1 month ago

petar-andrejic commented 1 month ago

Thanks for the nice tutorial, it's been very easy to follow. Just a small issue I noticed so far regarding buffer alignment:

Step 34 has a potential buffer overflow if the size of the index buffer is not already a multiple of four. Specifically, if the size ends up being padded, writing to the buffer will read in adjacent memory to indexData without any warning, as it is now strictly larger in size. It would probably be best to handle this properly by padding the CPU side of things as well, and warn readers that this is necessary.

Minimal working example of the issue, using dawn:

#include <iostream>

#include <webgpu/webgpu_cpp.h>

int main() {
    wgpu::InstanceDescriptor instance_desc{
        .features{
            .timedWaitAnyEnable = true,
        },
    };
    wgpu::Instance instance = wgpu::CreateInstance(&instance_desc);

    wgpu::RequestAdapterOptions adapter_opts{};
    wgpu::Adapter adapter;
    wgpu::Future requestAdapterFuture = instance.RequestAdapter(
        &adapter_opts, wgpu::CallbackMode::WaitAnyOnly,
        [&](wgpu::RequestAdapterStatus, wgpu::Adapter _adapter, char const*) {
            adapter = _adapter;
        });
    instance.WaitAny(requestAdapterFuture, UINT64_MAX);

    wgpu::Device device;
    wgpu::DeviceDescriptor device_desc{};
    wgpu::Future requestDeviceFuture = adapter.RequestDevice(
        &device_desc, wgpu::CallbackMode::WaitAnyOnly,
        [&](wgpu::RequestDeviceStatus, wgpu::Device _device, char const*) {
            device = _device;
        });
    instance.WaitAny(requestDeviceFuture, UINT64_MAX);

    wgpu::Queue queue = device.GetQueue();

    constexpr size_t itemSize = 3;

    wgpu::BufferDescriptor buffer_desc{
        .usage = wgpu::BufferUsage::CopyDst | wgpu::BufferUsage::MapRead,
        .size = itemSize * sizeof(uint16_t),
        .mappedAtCreation = false,
    };
    buffer_desc.size = (buffer_desc.size + 3) & ~3;
    assert(buffer_desc.size == 8);
    wgpu::Buffer buffer = device.CreateBuffer(&buffer_desc);

    struct SomeData {
        uint16_t cpuData[3] = {0, 1, 2};
        uint16_t someOtherData[5] = {3, 4, 5, 6, 7};
    };

    SomeData sd;

    queue.WriteBuffer(buffer, 0, sd.cpuData, buffer.GetSize());
    wgpu::Future mapFuture =
        buffer.MapAsync(wgpu::MapMode::Read, 0, buffer.GetSize(),
                        wgpu::CallbackMode::WaitAnyOnly,
                        [](wgpu::MapAsyncStatus status, char const* msg) {
                            if (status != wgpu::MapAsyncStatus::Success) {
                                std::terminate();
                            }
                        });
    instance.WaitAny(mapFuture, UINT64_MAX);
    uint16_t const* result =
        static_cast<uint16_t const*>(buffer.GetConstMappedRange());

    for (size_t i = 0; i < itemSize + 1; i++) {
        std::cout << "result[" << i << "] = " << size_t(result[i]) << '\n';
    }
}

The output is

result[0] = 0
result[1] = 1
result[2] = 2
result[3] = 3

since the buffer overshot the extent SomeData.cpuData and ended up reading data from SomeData.someOtherData as well.

petar-andrejic commented 1 month ago

I think the simplest solution is to use an aligned allocator for std::vector, for example as per this answer on stackoverflow, with 4 byte or larger alignment. This ensures that the write will always happen to safely allocated memory.