Wumpf / wgpu-profiler

Simple profiler scopes for wgpu using timer queries
Apache License 2.0
92 stars 27 forks source link

Timestamps are wrong on Vulkan #84

Open Dinnerbone opened 1 month ago

Dinnerbone commented 1 month ago

I'm not sure if this is an issue with this library, the rust tracy bindings, or tracy itself - so I figured I'd just shoot here and see if anyone has a suspicion as to what's wrong.

Checking out the current code and running cargo run --example demo --features tracy whilst having the Tracy profiler open, no valid GPU spans seem to be visible in Tracy. If I force DX12 or GL it works, so it seems to be specific to Vulkan.

With Vulkan I can see some suspicious duration spans reported in the statistics. Nothing on the timeline (but I haven't tried running it for 94 days to see if it'll show up :D) image

With DX12 they look good and thus are visible on the timeline: image

(I've tested on both Tracy v10, which is the version in the lockfile, and v11.1 by doing a cargo update)

The chrome flamegraphs seem fine and valid for both, so it's likely something to do with the actual Tracy integration rather than wgpu reporting invalid data. I also think it's unlikely to be Tracy itself or I figure someone would have noticed by now - so my bet is either something wrong with the bindings in rust_tracy_client or the use of the API from this crate.

I've also confirmed this in a real world application using wgpu - the example code is just a useful minimal repro.

Dinnerbone commented 1 month ago

Debugging this a bit, the initial timestamp we create the context with is 0 for Vulkan, whilst all the events are at timestamps like 50900559341226, putting them way in the future. DX12 generates a valid initial timestamp.

I guess this isn't the correct way to get the initial timestamp? It looks correct to me though... https://github.com/Wumpf/wgpu-profiler/blob/main/src/tracy.rs#L40

Dinnerbone commented 1 month ago

Splitting the retrieval of the initial timestamp over two encoders seems to fix it - so I'm wondering if this is a wgpu synchronization issue?

    let mut encoder1 = device.create_command_encoder(&wgpu::CommandEncoderDescriptor {
        label: Some("wgpu-profiler gpu -> cpu sync cmd_buf 1"),
    });
    encoder1.write_timestamp(&query_set, 0);
    encoder1.resolve_query_set(&query_set, 0..1, &resolve_buffer, 0);
    let mut encoder2 = device.create_command_encoder(&wgpu::CommandEncoderDescriptor {
        label: Some("wgpu-profiler gpu -> cpu sync cmd_buf 2"),
    });
    encoder2.copy_buffer_to_buffer(&resolve_buffer, 0, &map_buffer, 0, wgpu::QUERY_SIZE as _);
    queue.submit([encoder1.finish(), encoder2.finish()]);

This fills map_buffer with a correct value

Whereas (existing code equivalent):

    let mut encoder1 = device.create_command_encoder(&wgpu::CommandEncoderDescriptor {
        label: Some("wgpu-profiler gpu -> cpu sync cmd_buf 1"),
    });
    encoder1.write_timestamp(&query_set, 0);
    encoder1.resolve_query_set(&query_set, 0..1, &resolve_buffer, 0);
    encoder1.copy_buffer_to_buffer(&resolve_buffer, 0, &map_buffer, 0, wgpu::QUERY_SIZE as _);
    queue.submit([encoder1.finish()]);

map_buffer is zeroed out

Dinnerbone commented 4 weeks ago

It looks like this actually isn't just limited to Tracy. Whilst making a workaround for this, I noticed the first few profile events are also bad for me - and dumping all the chrometraces proves it :(

Chrometraces: Frame 1

{
"traceEvents": [
{ "pid":8104, "tid":1, "ts":0, "dur":0, "ph":"X", "name":"fractal 3" },
{ "pid":8104, "tid":1, "ts":760977552703.09, "dur":-760977552703.09, "ph":"X", "name":"rendering" },
{ "pid":8104, "tid":1, "ts":0, "dur":0, "ph":"X", "name":"render pass top" },
{ "pid":8104, "tid":1, "ts":0, "dur":0, "ph":"X", "name":"fractal 0" },
{ "pid":8104, "tid":1, "ts":0, "dur":0, "ph":"X", "name":"fractal 1" },
{ "pid":8104, "tid":1, "ts":0, "dur":0, "ph":"X", "name":"render pass bottom" },
{ "pid":8104, "tid":1, "ts":0, "dur":0, "ph":"X", "name":"fractal 2" }
]
}

Frame 2

{
"traceEvents": [
{ "pid":8104, "tid":1, "ts":0, "dur":0, "ph":"X", "name":"fractal 3" },
{ "pid":8104, "tid":1, "ts":0, "dur":0, "ph":"X", "name":"rendering" },
{ "pid":8104, "tid":1, "ts":0, "dur":0, "ph":"X", "name":"render pass top" },
{ "pid":8104, "tid":1, "ts":0, "dur":0, "ph":"X", "name":"fractal 0" },
{ "pid":8104, "tid":1, "ts":0, "dur":0, "ph":"X", "name":"fractal 1" },
{ "pid":8104, "tid":1, "ts":0, "dur":0, "ph":"X", "name":"render pass bottom" },
{ "pid":8104, "tid":1, "ts":0, "dur":0, "ph":"X", "name":"fractal 2" }
]
}

Frame 3 and beyond are normal:

{
"traceEvents": [
{ "pid":8104, "tid":1, "ts":760977895156.5499, "dur":37.439982406795025, "ph":"X", "name":"fractal 3" },
{ "pid":8104, "tid":1, "ts":760977894955.91, "dur":247.88000155240297, "ph":"X", "name":"rendering" },
{ "pid":8104, "tid":1, "ts":760977894973.9901, "dur":99.7600145637989, "ph":"X", "name":"render pass top" },
{ "pid":8104, "tid":1, "ts":760977894990.0299, "dur":43.20009611546993, "ph":"X", "name":"fractal 0" },
{ "pid":8104, "tid":1, "ts":760977895033.55, "dur":39.88004755228758, "ph":"X", "name":"fractal 1" },
{ "pid":8104, "tid":1, "ts":760977895092.07, "dur":102.2400101646781, "ph":"X", "name":"render pass bottom" },
{ "pid":8104, "tid":1, "ts":760977895092.63, "dur":63.5999022051692, "ph":"X", "name":"fractal 2" }
]
}