msiglreith commented 7 years ago

Motivation

With the current low-level API (ll) rewrite of the core a lot of stuff changes towards a vulkan-based API. This requires some changes to the layers above the core, in particular app and render.

app is a wrapper for writing our examples easier, which takes care of backend specific device initialization and handling of the main loop.

render provides some additional layers and macros for more safety and convenience. The encoder wraps the command buffer and extend it with useful implementations. pso macro is used for easier definition and creation of pipeline state objects.

Design

The low level API misses several safety nets compared to the old core (e.g. tracking resources, memory management, ..). ll is not supposed to be directly used by most of the users due to it's increased complexity and might easier result in synchronization issues etc. Therefore I would propose to transform render into a d3d11-styled layer on top of the ll core, which should solve the following aspects:

Resource tracking: Move the old resource handles into the render layer. Also, should try to hide image layouts and resource states to avoid pipeline layouts.
Async queues: Only provide one queue to the user, either a GeneralQueue (if compute is supported) or a GraphicsQueue. This also reduces potential synchronization issues.
Per thread pools: CommandBuffers are obtained from non-thread-safe pools, which need to be resetted but no buffer of the pools must be in-use. A common technique used to avoid stalling is to have a ring-buffer of command pools per thread. So we could hide the command pool API in the render layer.
Memory Management: This part is quite tricky and requires further research how the current ll API interferes with the 'legacy' backends.

Based on concept above, we need to adjust the device setup a bit as we only want to expose one queue and probably hide some parts of the memory management (e.g explicit heaps). Example API:

enum CommandQueue<R> {
    General(GeneralQueue<R>),
    Graphics(GraphicsQueue<R>)
}

struct Device<R> {
    factory: Factory<R>,
    queue: CommandQueue<R>,
}

...

// Internally creates adapters and surface, followed by creation of swapchain and device
let (Device { factory, queue }, swapchain) = window.init_device_and_swapchain::<ColorFormat>();
let queue = match queue {
    ... // depending on the capabilities you need (compute?)
};

...

'main: loop {
    // Move to the next frame
    let color_taget = swapchain.get_frame();
    ...
    // Queue fetches cmd buffers from the thread-local pool ringbuffer
    queue.create_graphics_encoder();
    ...
    queue.submit(cmd_buffer);
    ...
    swapchain.present();
}

Other render parts like the PSO macros should stay and it should be able to use them even if the users directly target the ll backend.

app will still remain but the main tasks will be to create the windows for each platform, handle window events and advancing to the next frame.

Drawbacks

The mentioned design above limits the possbilities of the users a bit, but otherwise it would be too low-level and require careful handling of synchronization. Probably also hard to implement some aspects like memory management if we want to reach performance of d3d11 drivers for example.

kvark commented 7 years ago

Excellent summary! I agree with everything except listed below:

Async queues: Only provide one queue to the user

I'd like to see multiple queues exposed in render, eventually. Since we aren't doing this now, it's fine to have this limitation upon ll transition, but we should keep the possible exposure of the queues on our radar. Of course, it would sorta make sense to expose them now since you are reworking the render interface anyway, and we don't know when the next good moment for this will come.

A common technique used to avoid stalling is to have a ring-buffer of command pools per thread.

What concerns me here is how sending command buffers between threads will work. Perhaps, we'll send the buffer itself, but the command buffer encoder (which is what needs the command pool) will be non-sendable?

let queue = match queue {
    ... // depending on the capabilities you need (compute?)
};

We may have sugar to avoid this for a general case of graphics-only work. E.g. impl GraphicsQueue for Queue shim, or even impl DerefMut<Target=GraphicsQueue> for Queue.

msiglreith commented 7 years ago

Good to hear! Thanks for the fast feedback!

What concerns me here is how sending command buffers between threads will work. Perhaps, we'll send the buffer itself, but the command buffer encoder (which is what needs the command pool) will be non-sendable?

If I understand it (vulkan) correctly, command buffers are not really supposed to be sent across threads as access to them (ie vkCmd....) requires external synchronization of the underlying command pool, so probably a lock per call. That's why I'm also exposing command buffers as non-Send in core. Only Submit (generated after finishing the builiding of a command buffer) should be (is?) marked as Send.

We may have sugar to avoid this for a general case of graphics-only work. E.g. impl GraphicsQueue for Queue shim, or even impl DerefMut for Queue.

Yep, already implemented! Queues can be downcasted in the hierarchy if they support the required functionalities.

Regarding the command queues I have to think a bit. My motivation behind this was that the people who want to use multiple queues or async compute will probably go straight towards the core and take care of the all required synchronization themself.

Cheers!

kvark commented 7 years ago

Only Submit (generated after finishing the builiding of a command buffer) should be (is?) marked as Send

Sounds good!

people who want to use multiple queues or async compute will probably go straight towards the core and take care of the all required synchronization themself

There is a lot of convenience in using render, and it would hurt to drop all of it once you want to scale an application to multiple queues. At the very least, render should provide the low-level synchronization even if it's the same as in core.

msiglreith commented 7 years ago

There is a lot of convenience in using render, and it would hurt to drop all of it once you want to scale an application to multiple queues. At the very least, render should provide the low-level synchronization even if it's the same as in core.

Fair enough! We can keep it the way it's done in core and try something else if it's too complicated.

davll commented 7 years ago

Another question: how to manage resources? option 1: smart-pointer like (old gfx api) option 2: big collection tracks resources and release resources in correct ordering, user can access resources with keys (ex: u32 ID)

kvark commented 7 years ago

@davll the pre-ll code used smart-like pointers. Problem with u32 keys is that we'd never know when the user no longer has any :)

davll commented 7 years ago

@kvark

yes, smart pointer approach is much safer and more friendly than pure integer IDs. However, I'm worried about its less flexibility and inefficiency for game engine design. The reason is that smart pointers should contain Device references for destructors and therefore cannot be sended (due to Device might not be sendable), leading to inflexible usage. IMO users are responsible to manage resources by themselves as if using entity-component system (where an world only contains entities and user should notify worlds to create or destroy entities). In a nut shell I prefer data oriented approach :P

To clarify: I'm not questioning how gfx_render should be. I'm trying to find out how to manage resources in my 2D rendering engine. :)

kvark commented 7 years ago

@davll there are multiple ways to get smart resource pointers without holding the device:

what pre-ll has: resources are Arcs, and the device also holds Arcs to them. When the devices detects no extra references to the stored pointers, it removes the resources.
what ll has WIP: resources store a channel that is used by their destructors. Device receives the data and cleans up.

where an world only contains entities and user should notify worlds to create or destroy entities

FWIW, Froggy manages components automatically, so smart resource pointers would make total sense there. It's not an ECS though.

gfx-rs / gfx

[RFC] ll based render API #1281

Motivation

Design

Drawbacks