Extended Presentation API Investigation

cwfitzgerald commented 2 years ago

Context

I'm working on frame pacing and we need some help from the api. The difficulty of designing this api is each WSI has different pieces of information and give it to us in different ways.

Supersedes #682 Supersedes #2650

Investigation

We have the following major WSIs to think about:

IDXGISwapchain (Windows 7+ - D3D)
IPresentationManager (Windows 11+ - D3D)
CAMetalLayer (Mac - Metal)
VK_GOOGLE_display_timing (Vulkan - Android)
VK_KHR_present_wait (Vulkan - Nvidia)
VK_KHR_incremental_present (Mainly Mesa/Android)
VK_KHR_swapchain (All Vulkan)

And we have the following primatives:

Get Present Start/End Time
Wait for Present Finish
Present with Damage
Schedule Present Time
Primary Monitor Frequency

	Present Time	Wait for Present	Present with Damage	Scheduled Present	Monitor Frequency
IDXGISwapchain	🆗 (1a)	🆗 (2)	✅ (3)	🆗 (4)	❌
IPresentationManager	✅ (1b)	✅	❌	✅	❌
CAMetalLayer	✅ (1c)	✅	❌	✅	✅ (5)
VK_GOOGLE_display_timing	✅	❌	❌	✅	✅
VK_KHR_present_wait	❌	✅	❌	❌	❌
VK_KHR_incremental_present	❌	❌	✅	❌	❌
VK_KHR_swapchain	❌	❌	❌	❌	❌

Notes: 1a. Presentation times need to be queried actively, it doesn't get told to us. 1b. Presentation times are given through an event queue. 1c. Presentation times are given through callbacks.

Can only wait for 1-3 frames ago, not a particular frame.
Windows 8+/Windows 7 Platform Update
You can schedule presentation for N vblanks from now.
Via NSScreen - need to figure out how to get NSScreen from metal layer.

Because of the diversity of the platforms, I think this will inherently be a leaky abstraction - this is okay - we shouldn't try to hide platform differences, just make it as easy to use as possible.

As such I have put together the following api.

Api Suggestion

Feature

First is to add a single Feature.

const EXTENDED_PRESENTATION_FEATURES = ...;

Presentation Features

Add an extended presentation capabilities bitflag that is queryable from the surface. I am separating this from regular features because they are more useful as default-on. Having the single feature means that users have to consciously enable it, but without needing to individually modulate them.

fn Surface::get_extended_presentation_features(&self, &Adapter) -> ExtendedPresentationFeatures;

bitflags! {
    // Names bikeshedable
    struct ExtendedPresentationFeatures {
        const PRESENT_STATISTICS = 1 << 0;
        const MONITOR_STATISTICS = 1 << 1;
        const WAIT_FOR_PRESENTATION = 1 << 2;
        const PRESENT_DAMAGE_REGION = 1 << 3;
        const PRESENT_DAMAGE_SCOLL = 1 << 4;
        const PRESENT_TIME = 1 << 5;
        const PRESENT_VBLANK_COUNT = 1 << 6;
    }
}

Presentation Signature

The presentation signature will be changed to the following.

fn Surface::present(desc: PresentationDescriptor<'a>);

#[derive(Default)] // Normal presentations will be PresentationDescriptor::default()
struct PresentationDescriptor<'a> {
    // Must be zero-length if PRESENT_DAMAGE_REGION is not true
    rects: &'a [Rect],
    // Must be None if PRESENT_DAMAGE_SCOLL is not true
    scroll: Option<PresentationScoll>,
    // Must be NoDelay if PRESENT_TIME or PRESENT_VBLANK_COUNT if not true
    presentation_delay: PresentationDelay,
}

struct PresentationScroll {
    source_rect: Rect,
    offset: Vec2,
}

struct Rect {
    offset: Vec2,
    size: Vec2,
}

enum PresentationDelay {
    // Queue the frame immediately. 
    NoDelay,
    // Queue the frame for N vblanks from now (must be between 1 and 4). Needs PRESENT_VBLANK_COUNT.
    ScheduleVblank(u8)
    // Queue the frame for presentation at the given time. Needs PRESENT_TIME.
    ScheduleTime(PresentationTime)
}

Presentation Timestamp

Because different apis use different timestamps - we need a way of correlating these timestamps with various other clocks. The clocks used are as follows on each WSI:

WSI	Clock
IDXGISwapchain	QueryPerformanceCounter
IPresentationManager	QueryInterruptTimePrecise
CAMetalLayer	mach_absolute_time
VK_GOOGLE_display_timing	clock_gettime(CLOCK_MONOTONIC)

Add the following function to the surface.

fn Surface::correlate_presentation_timestamp<F, T>(&self, &Adapter, F)  -> (PresentationTimestamp, T) where FnOnce() -> T;

// Unit: nanoseconds
struct PresentationTimestamp(pub u64);

Which will let people write the following code to correlate instants and presentation timestamps. We need this because Instants need to be treated as completely opaque as the clock they use can change at any time. In most cases these are actually the same clock, but this is what we get.

let (present_timestamp, now) = surface.correlate_presentation_timestamp(&adapter, Instance::now);

Presentation Statistics

Because of the difference in how all the apis query stats, we need to abstract this carefully. We use a query-based "presentation statistics queue".

CAMetalLayer: Callbacks will save the time into a queue, which is emptied every time it is queried.
IPresentationManager: Calling the query function drains the statistics queue.
IDXGI: Query calls GetPresentationStatistics and returns a single value.
VK_GOOGLE_present_timing: Calls vkGetPastPresentationTimingGOOGLE which drains the queue.

fn Surface::query_presentation_statistics(&self, &Device) -> Vec<PresentationStatistics>;

struct PresentationStatistics {
    presentation_start: PresentationTimestamp,
    // Only available on IPresentationManager
    presentation_end: Option<PresentationTimestamp>,
    // Only available on VK_GOOGLE_display_timing
    earliest_present_time: Option<PresentationTimestamp>,
    // Only available on VK_GOOGLE_display_timing
    presentation_margin: Option<PresentationTimestamp>,
    composition_type: CompositionType,
}

enum CompositionType {
    // CAMetalLayer is always Composed
    Composed,
    Independent,
    // Vulkan, DXGI is always unknown
    Unknown,
}

Presentation Wait

First add the following member to SurfaceConfiguration:

// Requires WAIT_FOR_PRESENTATION and must be between 1 and 2.
maximum_latency: Option<u8>

This adjusts either the swapchain frame count to value + 1 - or sets SetMaximumFrameLatency to the value given, or uses a wait-for-present in the acquire method to limit rendering such that it acts like it's a value + 1 swapchain frame set.

Monitor Information

Getting exact frequencies of monitors is important for pacing - they can be derived from presentation stats, but an explicit api is more precise if it is available.

fn Surface::query_monitor_statistics(&self, &Device) -> MonitorStatistics;

struct MonitorStatistics {
    // In nanoseconds
    min_refresh_interval: u64,
    max_refresh_interval: u64,
    // On available on CAMetalLayer
    display_update_granularity: u64,
}

Conclusion

This is obviously one hell of an api change, and this doesn't have to happen all at once, but this investigating should give us the place to discuss the changes and make sure it provides the information needed.

i509VCB commented 2 years ago

For EGL, the WSI can do present with damage if the EGL_KHR_swap_buffers_with_damage extension is supported.

superdump commented 2 years ago

Random thoughts incoming:

Does WSI mean windowing system integration? It's unfortunately not the most searchable abbreviation.
For PRESENT_STATISTICS I feel that the PRESENT is needed to differentiate from MONITOR_STATISTICS. However, I feel it should be at least PRESENTATION_STATISTICS.
MONITOR_STATISTICS reads as if 'monitor' means 'keep track of' as in verb noun. I first thought DISPLAY_STATISTICS but that has the same problem. SCREEN_STATISTICS is maybe a bit less problematic even though grammatically it could go either way. I feel like I want DISPLAY_REFRESH_STATISTICS to be the outcome. :)
There are scoll typos in places. Where does the 'scroll' term come from? Scroll-lock comes to mind but that's an old cathode ray tube (CRT) feature. From the structs it looks like it's for requesting which part of the screen would be actually updated with what portion of the provided frame. Is that correct?
It's naïvely surprising to me that Vulkan does not generally support presentation times / scheduling presentation when the others do
It feels like there must be a way on Windows to get information about display refresh rates and timings...
What is the difference between presentation_start and earliest_present_time?
What does presentation_margin mean?

Taking a step back and thinking about how one would want to use this - the presentation and display refresh statistics provide information that can be used to make some kind of estimation/prediction for frame pacing, and the presentation descriptor then allows making an attempt at controlling presentation of a given frame. I'd have to think through it more thoroughly to be able to figure out whether it's sufficient and ergonomic.

cwfitzgerald commented 1 year ago

It feels like there must be a way on Windows to get information about display refresh rates and timings...

@superdump There is, it's a bit complicated, but should be implementable within winit. You can get very specific timing info about all your monitors. Now that winit exposes micro-hertz refresh it should be usable for pacing. We just also need to expose the precision of the hz measurement.

badicsalex commented 1 year ago

I'm not sure what the status here is, but I'd love to implement the VK_GOOGLE_display_timing version, if you can give some guidance.

cwfitzgerald commented 1 year ago

@badicsalex Sorry this totally got lost in the information firehose. None of this (outside of getting cpu-side presentation timestamps) is implemented yet and we'd love help! Come on our matrix and chat, that'd probably be the easiest way to sync up.

badicsalex commented 1 year ago

@cwfitzgerald thanks for the answer. We've investigated the issue in detail since then, and it seems that VK_GOOGLE_display_timing wouldn't give us much over simply measuring the acquire times of a simple FIFO mode, so we didn't pursue that angle any further.

jimblandy commented 3 months ago

I talked a bit with @DJMcNab at RustNL last week, and he said that partial present functionality was important to the Xilem team.

marcpabst commented 3 months ago

I'm still interested in this too, just haven't gotten around to properly look at this. I have a fork somewhere that implement a crude way of getting presentation information on Apple hardware, but I think the main problem here is finding out to integrate data from the different APIs into wgpu's data model. Having precise statistics about when presentation happened is super crucial for my use case and now that the rest of my project is somewhat shaping up, I might have another look.

On a side note, I'd also be interested in running a handler as soon as possible after frame presentation, but I'm not sure if that is even possible outside of Apple/Metal.

gfx-rs / wgpu