amethyst / rfcs

RFCs are documents that contain major plans and decisions for the engine
Apache License 2.0
32 stars 10 forks source link

[RFC] New amethyst render #12

Open zakarumych opened 6 years ago

zakarumych commented 6 years ago

Fancy and shiny new amethyst render proposal.

This RFC made in attempt to systematize ideas and thoughts on new render I went on writing almost a yer ago.

What this RFC is about

I would try to describe how new amethyst render can look like. Both from user perspective and implementation. The aim of this RFC is to gather feedback on mentioned problems and proposed solutions.

What wrong with current render

Let's step back and look at current render. Why do we even want to replace it? First thing that came to mind is singlethreadedness. My early attempt to make things run in parallel only made it worse (proven by @Xaeroxe, when he simplified it to run in one thread performance increased). Come to think of it make it obvious, OpenGL has singlethreaded heart. Commands we encode in parallel become serialized at flush time.

The second pain point is singlethreadedness. Yes. Again. It hurts this much. We can't even create and upload resources (images and buffers) in parallel. This means we can't do it in our Systems. Current workaround is to defer resource initialization to be complete by render. Loading code is overcomplicated because of this. Also makes it impossible to generate data for GPU each frame outside render (think mesh generation from voxels).

Significant overhead. Current render works on pre-ll gfx that supports only OpenGL right now. Each layer adds an overhead. New APIs provide opportunities for optimizing in way more places and reduces problem with CPU-bottleneck. Yet pre-ll gfx doesn't support newer APIs and even if will it gives user same freedom as OpenGL where user can't optimize based on usage too much. OpenGL users utilize arcane techniques to squeeze as much performance as possible. If we start doing so we may end up with unmaintainable pile of hacks buried into endless pit of sorrow.

Solution

We need to write new render that will be based on modern graphics APIs like Vulkan, DirectX 12, Metal.

But which to choose? We can't choose one without sacrificing platform support. We can't manually support each of them either. Gladly it already taken care of.

gfx-hal

gfx-hal is not an evolution of pre-ll gfx. It's a brand new thing. gfx-hal's API is based on Vulkan API with added rustyness but with minimum overhead.

gfx-hal should open the path to support following platforms

Sadly gfx-hal is not even close to become stable.

ash

Another alternative is ash. With Vulkan/Metal bridge like MoltenVK or gfx-portability we would support:

ash requires more boilerplate and careful implementation. It is essentially raw Vulkan API for rust. Which means it is pretty stable.

Supporting multiple backends in our higher-level render.

It can be done. It is even simpler to do in higher-level code. But I don't think it is a feasible option.

High-level render design outline

Modules

While amethyst will use the render as a whole it doesn't mean render code must be written as a huge code blob. It may be helpful to design render as collection of modules each of which solves one problem at a time.

What problems higher-level render should solve you may ask. Let's describe few:

Memory management.

Modern APIs have complex memory management story with lots of properties, rules and requirements for the application. Higher-level render should give the user straightforward API for create/destroy resources and transfer data between resources and host.

Work scheduling.

Vulkan have 3 types of objects in API for scheduling work to the device. Namely vkQueue, vkCommandPool and vkCommandBuffer.

vkQueue inherit capabilities from its family and user is responsible to not try to schedule unsupported commands. Higher-level render should check that (at compile time where possible).

vkCommandPool is simple as an axe. No fancy wrapper required except tracking queue family it belongs to.

vkCommandBuffer have implicit state that changes subset of functions that can be used with it. Higher-level render should prevent both wrong usage and unnoticed state change. To prevent implicit transition to the Invalid state there must be facility to hold resources referenced in recorded commands from being destroyed.

Pipelines

Manually describing and switching graphics and compute pipelines is hard and error-prone. Higher-level render should support pipelines described in declarative manner and automate their binding.

Synchronization

New graphics APIs such as Vulkan require explicit synchronization between commands when they depend on each other or use same resource. This topic is really complex. Rules are sophisticated. Errors could be hidden until release. Framegraph approach allow automatic synchronization between nodes of the graph. gfx-chain library does this kind of automatic scheduling between queues and deriving synchronization required. It should be reworked to remove gfx-hal dependency from which only few structures used anyway. Because of upfront knowledge for the resource usage it is possible to greatly optimize memory usage by aliasing transient resource that is never exists together.

Descriptors

Handling descriptors is non-trivial work and should be simplified by higher-level render. But can be done later as the only one who will work with them are render-pass writers. Suboptimal usage of descriptors are very simple and should be OK until becoming a bottleneck.

Higher-level primitives

While graphics API consume resources, pipelines and lists of encoded commands to to their job the user shouldn't be faced with such low-level concepts unless he tries to render something non-trivial. Well defined common use cases could be bundled and provided out of the box.

would be a good start.

What there already is

At this point I have memory manager ready-to-test and prototype of command buffer/queue safety wrappers. There is also

TODO: Add shiny diagrams and fancy snippets.

zakarumych commented 6 years ago

About memory manager

The problem with good old gfx-memory is that it doesn't provide really smart memory allocator (although it exports SmartAllocator). In my image of new render average user doesn't care about memory types and properties. All user concerns about is usage. Memory usage is a pair of resource usage the memory will be bound to and following properties: how host will read/write the memory, how device will read/write the memory. Resource usage is described via usage flags that backend translates into set of memory types that support the usage. Other usage properties can be summarized into next 4 categories:

Hence the allocator must have convenient method that accepts memory type mask that supports resource usage and one of variants described above.

The allocator should be responsible for handle memory mapping in safe way as it is impossible to map more than one memory region of the same memory object at the same time. For example memory object created for Upload and Dynamic usage could be mapped on allocation so that multiple memory regions could be written simultaneously.

All this can make the allocator very opinionated. In order to fix this we can provide methods with finer control.

AnneKitsune commented 6 years ago

I don't have the required experience to really give opinions on this, but thanks for the information! :) :+1:

zakarumych commented 6 years ago

About queues and commands

The main problem with commands recording is following.

Tracking all resources referenced in commands and command completion is hard.

Vulkano does this by storing references to all used resources and then releasing them when host ensures that commands are complete. This adds complexity to the implementation and adds overhead. It is preferred that commands for frame be recorded without single memory allocation.

I propose different solution. Instead of tracking which resources were used in commands we would track only resources that are destroyed and assume that they are used in all commands recorded so far and wait for all commands recorded before resource being destroyed before actually destroying it. Pros:

In either way all queues must signal fence each frame in order to simplify detection of command completion.

Command buffer should know when all commands from it are complete. Similar to the problem with resources it needs to query information from queues to check if commands are complete. But unlike dropped resource wrappers command buffer is still possessed by the user. So in order to transit command buffer from pending state user have to provide it with references to all queues it was submitted to.

Xaeroxe commented 6 years ago

After reading this whole thing I'm actually quite compelled to just move away from gfx altogether. That being said, Vulkan support only isn't a perfect solution either, as the compatibility layers require mildly complex linking and might not always work. Someday we're going to want to support something that's missing Vulkan support, such as a game console or otherwise. In my opinion we need a solution that can accommodate multiple APIs. It's not important that we implement all the APIs at once, but merely that the architecture permits adding in new APIs.

I've come to realize after typing that it might sound like I'm proposing we re-implement gfx. That's not quite where I'm at though. I guess the best way to describe what I have in mind would be to have multiple renderers, where each renderer is a modular piece. Try and avoid having the APIs overlap so much that they cause incompatibility or performance issues with each other, but where overlap can exist without compromising the prior goals, let's have it exist.

minecrawler commented 6 years ago

Maybe it would make sense to go for ash for now, because it is quite stable and ready, but later on switch over to gfx-hal. gfx-hal is described as offering a Vulkan-y API while supporting all the important backends. That would mean that later on, once gfx-hal is stable, switching ash out would be rather simple ("just" replace ash calls with hal calls) and Amethyst could benefit from all the work which has gone into the hal already.

I don't think it would make a lot of sense to try to come up with a solution to a problem which does not exist, yet (namely preparing to implement all the backends manually). Amethyst still has a long way to go, and I doubt anyone would build a console game with it in the near future!

Xaeroxe commented 6 years ago

Is gfx-hal still coming? I thought there were proposals on the way to just dump it and work on ash instead.

zakarumych commented 6 years ago

They have new idea bal! bal will be closer-to-backend that expose capabilities of the backend without pretending it has what it doesn't actually. This middle layer will be used to implement hal by implementing hal::TraitName for bal::TypeName where possible and emulating where not. And some truly advanced users may use bal instead of hal to avoid using hals emulations.

zakarumych commented 6 years ago

The problem with using hal is that if backend do something differently than vulkan then vulkan is behaviour emulated. Hence overhead.

zakarumych commented 6 years ago

I'm thinking that the way to go is not just plain ash, hal or <any-low-level-crate-name>. But to specify our high-level in terms of traits and implement them for ash. Later we will see if we need to switch to hal or reimplement our traits for another backend. Rule of thumb will be "don't expose backend"

fu5ha commented 6 years ago

So, I'm having a lot of the same debates with myself right now as well and this is a great synopsis. I guess my one big concern is, can we really afford to lose OpenGL compatibility? Vulkan support is progressing and there are lots of systems that now support it, but then again, there are also lots of systems that do not support it. (not just platforms like game consoles, but older computers with an older graphics processor, etc.)

The problem with using hal is that if backend do something differently than vulkan then vulkan is behaviour emulated. Hence overhead.

While this is true, if we are using the Vulkan backend then the overhead is very minimal, and in the case of using the Metal backend, performance is on-par with the only other metal translation layer out there (MoltenVK), so there's really no advantage to using ash + MVK over hal in this way. And what you get from hal is that in the case where an optimal target (Vulkan or Metal) is not available, you can (in theory) fall back to another one. Of course, right now the other backends are not really up to snuff, but work is being put into getting the gl backend at least functioning, and I believe that having a functioning gl backend is a huge huge plus for something like Amethyst.

IMO the main reason to choose ash over hal is api stability, which is fairly important, but at the same time, I think the actual hal api surface is relatively stable at this point, and it's mostly the underlying implementation that's in flux with bal etc.

fu5ha commented 6 years ago

In either way all queues must signal fence each frame in order to simplify detection of command completion.

I think in general queues should signal fences each frame to prevent having more than <swapchain_image_count> frames in flight at once

zakarumych commented 6 years ago

Not all queues need to signal a fence in general case. They could signal a semaphore which is waited by another queue which signals another semaphore ... long story short one queue waits others and signal a fence. But to automatically guarantee that command is complete you either traverse signaling chain until the fence or wait for one fence per queue.

OvermindDL1 commented 6 years ago

Vulkan support is progressing and there are lots of systems that now support it, but then again, there are also lots of systems that do not support it. (not just platforms like game consoles, but older computers with an older graphics processor, etc.)

/me coughs while mumbling having the last generation of AMD cards that do not support vulkan...

Although that may change in the next month or two, so eh... But we do still exist.

I'm good with just OGL and Vulkan.

fu5ha commented 6 years ago

long story short one queue waits others and signal a fence. But to automatically guarantee that command is complete you either traverse signaling chain until the fence or wait for one fence per queue.

Ah right this is what I thought you'd meant, this seems like the "right" way to do it no?

Also, here's some prior art related to modern rendering architectures and frame graphs, possibly helpful

http://ourmachinery.com/post/a-modern-rendering-architecture/ http://ourmachinery.com/post/high-level-rendering-using-render-graphs/ http://www.gijskaerts.com/wordpress/?p=98

fu5ha commented 6 years ago

Okay, here's a structure I came up with and briefly explored in my head but haven't really thought all the way through: base the renderer design/structure somewhat on specs' design, or possibly even extend specs in some way to allow this:

You have a Renderer, somewhat analogous to a specs World, into which you can insert RenderPasses, analogous to Systems, and ResourceHandles, analogous to Entitys, except that they are also required to have a name. You can attach Resources to a ResourceHandle, which are analogous to Components, and define a type of CPU or GPU resource (Buffer, Image, Shader, ) and how to store this resource internally. Then RenderPasses, like Systems, define their RenderData type, from which they can request resources using ReadStorage, WriteStorage, or ReadWriteStorage, the Resource, and the name of the specific ResourceHandle they want. They can then define how to set up the actual pass (pipelines etc) in a setup method, which provides a RenderPassBuilder that they can modify to do whatever they need to. They then record commands into a CommandBuffer, which is analogous to a Vulkan command buffer, but with a much simplified api, during their run method, This can include updating buffers, binding buffers, issuing draw commands, etc.

The RenderPass trait could look something like

trait RenderPass {
    type RenderData: Blah;
    fn setup(&self, data: Self::RenderData, builder: &mut RenderPassBuilder);
    fn run(&mut self, data: Self::RenderData, cmd_buf: &mut CommandBuffer);

Each CommandBuffer can be associated with a unique sort key to that pass, so that the command buffers can be recorded in parallel, sorted using the sort keys. RenderPasses would be ordered by a weighting function based on the order they are inserted and the resources they need as input and output. Synchronization commands can then be derived from this using something like gfx-chain. These sorted commands would then be submitted to queues, in parallel as much as possible based on their state changes and the synchronization derived earlier.

minecrawler commented 6 years ago

@termhn I like your idea :) It's quite abstract and for your average Amethyst user, that's not even a new concept.

Where would you see the binding for shader input and uniforms? Would that be the RenderData? What would happen inside RenderPass::run()?

zakarumych commented 6 years ago

@termhn This concept is very similar to xfg which is basically the same +extra info required by Vulkan and render passes can be connected arbitrary at setup time (no names). Things may become much more simple if xfg's Node would be the only safe point to insert rendering code by the user. Because now the render can control how commands are submitted and executed.

JohnDoneth commented 6 years ago

While it would be great to have a compatibility layer for OpenGL; I believe a Vulkan oriented approach would be the more beneficial choice in the long run. There is only so much effort you can put in to supporting legacy technology before you start making sacrifices. Every year there will be more Vulkan supported devices, the oldest Vulkan supported GPU is the Radeon HD 7000 which came out in the beginning of 2012.

It might be more of a hassle trying to fit OpenGL's single-threaded-ness (square peg) into a multi-threaded engine (round hole) so to speak, and trying to juggle both ways of thinking, rather than just embracing a Vulkan approach as a whole. OpenGL from my view at the moment comes as a bottleneck and hindrance when it comes to the parallel nature of Rust and Amethyst as a whole. Not being able to perform graphics operations in Systems in a parallel nature seems like a big flaw.

AnneKitsune commented 6 years ago

I'm more worried about mobile and web compatibility honestly. I don't think we would have that many issues supporting only vulkan if we planned on supporting desktops only.

zakarumych commented 6 years ago

Modern android devices support vulkan, ios support metal. Running in web could be a problem. But again. It is much easier to hack in opengl support into high-level engine rather than low-level graphics API.

fu5ha commented 6 years ago

@minecrawler

Where would you see the binding for shader input and uniforms? Would that be the RenderData? What would happen inside RenderPass::run()?

You would do this kind of binding in setup method. Creating a Pipeline would entail binding shaders for each stage, defining bindings between binding points in shaders and ResourceHandles from RenderData, and linking color and depth/stencil attachments from ResourceHandles. You'd probably do something like builder.create_pipeline(layout) where layout is a PipelineDesc or some such which you add that information to.

zakarumych commented 6 years ago

@termhn If I understand correctly the one disadvantage of your solution is linking of actual resources to RenderPass through names. It's kinda like passing arguments to functions through global variables.

In xfg the node (that can be renderpass or compute etc) declares set of resources it use (usage, access flags, image layout). During framegraph construction user declares which actual resources each node will get. This way you can connect nodes arbitrary. For example one UI rendering node can target surface or transient image which will be read by another node. Also each node takes a set of nodes it depends on to ensure they are executed first. Typically some nodes write resource and dependent nodes read that resource.

After that gfx-chain schedule nodes to queues and derive synchronization required. xfg allocates all transient resources and give them to nodes to borrow. Here xfg can optimize memory by aliasing resources that aren't need to exists simultaneously.

Each frame nodes run in parallel and command buffers are submitted in order from schedule gfx-chain created. Nodes creates their own command pools to be able prerecord commands once when possible.

That's idea behind xfg and this is what I was thinking to do for amethyst.

fu5ha commented 6 years ago

@omni-viral I'd be in favor as long as we make xfg have an extra layer of abstraction that is its own, so it's not so tightly coupled with hal. Make it spit out the necessary info and then have any backend be able to use that information to implement the actual commands. This should also make it easier to use for someone who doesn't know everything about the new explicit apis, because they're not worried about that part of things, it just gets taken care of for them.

Another possibility would be to use the xfg model but just modify its frontend interface to look more like what I suggested above, as I think one of the benefits would be that it's already familiar to people using amethyst, as @minecrawler suggested above.

zakarumych commented 6 years ago

@termhn Yeah. xfg could be rewritten to hide backend completely behind own traits.

modify its frontend interface

Not sure what you mean here 🤷‍♂️

fu5ha commented 6 years ago

@omni-viral I basically just mean make the way to actually set up xfg's state more user-friendly to someone who doesn't already understand everything that's going on under the hood

zakarumych commented 6 years ago

@termhn I understand. But I need an example. Cause I the one who understand what's going on. So it may be hard to guess what is confusing 😄

fu5ha commented 6 years ago

@omni-viral http://www.gijskaerts.com/wordpress/?p=112

I think we could Rust-ify the ideas in that design pretty easily, and they make a lot of intuitive sense for anyone that's done some low level graphics programming, while allowing us to do internal resource synchronization etc. so that the user doesn't have to worry about it--only about what they need when and how to use it.

zakarumych commented 6 years ago

Whoops.

AnneKitsune commented 6 years ago

New constraints that have arised following discussions:

karroffel commented 6 years ago

I don't think just forgetting about OpenGL is the right way to go.

If it was for me, I would go with gfx-hal (even if "unstable" it seems pretty stable on the front-end). Some concerns I heard were that it yields worse performance on OpenGL since it has to do some emulation work. For newer graphics APIs it seems to be a decent enough mapping to have a low overhead (from what I have seen so far).

If the general idea is to leave OpenGL because of the singlethreaded limitations then I would like to argue that having "bad" OpenGL support and good Vulkan/Metal/DX12 support is a loooot better than having no OpenGL support and good Vulkan/Metal/DX12 support.

I don't see yet how gfx-hal would limit an implementation backed by one of the "next-gen" APIs, so why the incentive to go away from it? If you want to focus on these new APIs you are not losing anything with hal, right?

I think OpenGL support is not something you want to throw out that easily.

(context: I work part time for a game engine to backport their renderer to OpenGL ES 2.0 (Desktop 2.1) because we have many people who use hardware that doesn't support OpenGL 3.3 or phones that support anything newer than GLES2. There is still a huge market for GL only and that's not going away anytime soon. I wished it was, I don't like OpenGL, but that's pretty much a fact, sadly.)


I really like @termhn 's proposal with the ECS-like structure, seems really nice! :+1:

fu5ha commented 6 years ago

I don't see yet how gfx-hal would limit an implementation backed by one of the "next-gen" APIs, so why the incentive to go away from it? If you want to focus on these new APIs you are not losing anything with hal, right?

This is about what my opinion is as well.

As far as my earlier ECS-like idea... I've somewhat reconsidered it for a more framegraph-friendly/centric solution, just with naming that's a bit more friendly. I've stubbed out some of my ideas here: https://gist.github.com/termhn/3fdf441e74f36e0439e6ae8beeec777f

This is designed for what we need in developing Veloren, but it's really just a slightly limited version of an architecture that could have more features pretty easily. The idea is that you can abstract away the backend completely from the "renderer" but in a way that allows efficiency for the new APIs. This leaves the possibility of implementing API-specific backends in the future if we want to improve performance on a specific platform or to implement something like an old-OGLES2 backend (pray to god that we won't lol).

The main interface is that you use the Renderer as essentially a resource manager. You then create a FrameGraph every frame (or as often as you need to change it, the idea is that it would be cheap enough to change on a frame by frame basis) into which you can bind "Logical" resources and "physical", handle-based resources. Logical resources are resources that are related only inside of a single frame-graph, and are used to link resources between render passes. For example you can have a logical resource that is an output from one pass and an input into another. However, the backend is responsible for automatically allocating and assigning that resource (and can repurpose previously allocated resources that aren't being used elsewhere). Physical resources are a specific, previously allocated buffer or texture, and are used when you need to be able to access the data from the CPU (for example a texture, vertex buffer, etc.) Commands are recorded into per-pass CommandBuffers, and one logical resource is promoted to be the "backbuffer," meaning it is what will be actually rendered to the swapchain images and presented. Finally, you submit the framegraph to the renderer and it actually performs the recorded commands and automatically handles resource transitioning.

Rhuagh commented 6 years ago

I believe in an approach where the renderer is basically a resource that you can use to setup the framegraph, and then pass code can request a command buffer from to record into at any time (in parallel with other passes/other tasks (like physics etc) preferably), and then have a finisher system that do the final submission.

zakarumych commented 6 years ago

@termhn Xfg does almost exactly what you describe except framegraph creation costs too much to do per frame. Caching can remove cost of pipeline and renderpass instantiation but synchronization of passes can take long time. Memory of transient resources should be reused by other non-overlapping (in execution space) resources. But this can be added later since framegraph already owns them. Per pass command buffer is a good default but pass may require more than one. Zero or more logical resources could be assigned to actual resources. Backbuffer is just an array of image resources. Using more than one backbuffer is desirable but can lead to problems.

seivan commented 6 years ago

Is the idea that Amethyst will use it gfx internally, and offer higher level API like SpriteKit or Cocos2d-x for creating sprites and shaped objects that can be tweened and colored?

After reading this thread, it looks like the intention is to just write lower level bindings?

zakarumych commented 6 years ago

@seivan The main goal for new render is to expose safe API that one can use to render anything imaginable. Toolkit for simplify rendering of meshes, sprites, ui etc could be provided on top of the render.

seivan commented 6 years ago

Is that something Amethyst is planning on doing or is the idea to keep the API design pretty bare bones?

zakarumych commented 6 years ago

Render API will be pretty bare bones. But

Amethyst will have toolkit for at least:

Well. Those I will add if no one will do this before me 😄 Other contributors could add other tools to the kit.

LucioFranco commented 6 years ago

@omni-viral down the line, I'd be interested in helping out with some of the terrain implementation. Just let me know.

fintelia commented 6 years ago

I've also done a bunch with terrain rendering and would be interested in being involved.

AnneKitsune commented 6 years ago

It look awesome @fintelia

Moxinilian commented 6 years ago

@fintelia Your help would be extremely appreciated! Feel free to join us on the Amethyst Discord server to discuss about the implementation work.

Moxinilian commented 6 years ago

Now that the renderer is in good progress, 0.10 will focus on implementing it and making 3D rendering in Amethyst high quality. The plan for features in 0.10 will be discussed soon, but we hope to significantly ramp up the quality of 3D rendering in that version by improving the workflow and making use of modern technologies.

Telzhaak commented 5 years ago

Is there a "feature list" of all the things currently lacking in the Renderer anywhere, to keep track of tasks to do?

Ideally somewhere that users can add feature requests themselves (which can either be acknowledged or discarded)

dotellie commented 5 years ago

@Telzhaak I think here is a pretty good place to put stuff like that. 😉

minecrawler commented 5 years ago

@magnonellie I doubt that here is a good place. This is the RFC for the tech, and I think it should stay that way. I think, a feature list should be created in a wiki or readme file (with working status) or even the book and new features should be requested in separate issues, so they can be discussed, approved/rejected and, if applicable, added to the central list. It should be a list, which exists and is maintained even after the new renderer was merged, because I am pretty sure that the list will grow even afterwards. At least when new tech becomes available which affects the renderer directly.

Also it might be interesting to define a separation of stuff which should go into the renderer, and what should go into the engine or even another crate. @omni-viral already mentioned that the renderer should be bare-bones, but that's a pretty abstract statement. Where should basic shaders be located, and what should the renderer provide out-of-the-box? Where should predefined passes go, and which ones will be available, if any at all? What about different color formats? Support for setting up different graphical options during runtime? Raytracing and compute integration? VR essentials? And so on. I am not deep enough in the whole renderer topic to make a good distinction, as are most people who just want to use features for their games, so giving pointers about how and where to look, request and ask about stuff would be fairly useful :)

torkleyy commented 5 years ago

Update: we'll be going with gfx-hal now. The renderer is being worked on here for now.

happydpc commented 5 years ago

Is the vulkan part mature enough to be used?

LucioFranco commented 5 years ago

@happydpc Vulkan is at a 1.0 general release so yes it is ready for commercial use.

fhaynes commented 5 years ago

Transferring this to the RFC repo.

korczis commented 3 years ago

Are there any news? What are the plans? What is the current state?