PistonDevelopers / conrod

An easy-to-use, 2D GUI library written entirely in Rust.
Other
3.35k stars 296 forks source link

Continued discussion about piston-graphics #830

Open bvssvni opened 8 years ago

bvssvni commented 8 years ago

See reddit discussion https://www.reddit.com/r/rust/comments/52jtl1/this_year_in_conrod_an_update_from_the_purerust/d7m9y8v

It is about going further with developing render backends for Conrod, or wrap some functionality in piston-graphics.

I'll move this discussion here so we get all information in one place.

bvssvni commented 8 years ago

Some points of importance:

bvssvni commented 8 years ago

cc @mitchmindtree @kvark

kvark commented 8 years ago

For the last point, I wonder if https://github.com/csherratt/rusterize could be revived. cc @csherratt

kvark commented 8 years ago

The Piston-Graphics design is immediate mode, but allows a backend to override default methods for common primitives. By default it uses triangulation.

Which default methods do you mean here? Also, if the immediate mode is a problem, we might get a new API for building shapes. Basically, it would record all the buffer uploads triggered by compatible draw calls (as in, using the same shader and primitive type, blending, etc), and then return the buffer for reuse (on the following frames if UI layout doesn't change).

Glium and Gfx backends store colored triangles in a buffer to reduce draw calls. Problem is they can't take textures, so vector graphics alternating with text is a tricky combination.

How is it currently solved without piston-graphics?

bvssvni commented 8 years ago

@kvark Default trait methods https://docs.rs/piston2d-graphics/0.17.0/graphics/trait.Graphics.html

bvssvni commented 8 years ago

Had no idea somebody was working on a 3D software rasterizer! Can this be made a Gfx backend?

kvark commented 8 years ago

Had no idea somebody was working on a 3D software rasterizer! Can this be made a Gfx backend?

It has been discussed and generally agreed upon, but this would be a non-trivial amount of work.

ghost commented 8 years ago

It could be revived, and it was fun to write. The performance was ok, but I would not call it fast. Bringing it to rust's stable branch might make the problem even worse since rust's stable branch does not support SIMD at the moment.

mitchmindtree commented 7 years ago

@bvssvni thanks for opening this!

@kvark I'll reply to your remaining Qs from the reddit thread here:

is it kind of too late for you to go back to piston-graphics at this point?

Not at all, if we can manage to improve the efficiency and flexibility of piston-graphics then I'd probably just get to delete some code!

Could you point to these function signatures, just to get a feel of it?

Sure, the idea was that these functions would be feature gated within the conrod::backend module (piston_window is currently the default as it's the only one currently implemented). Here are the piston_window backend functions and here is what it looks like when one is called. Each function for each backend would look different, depending on what is practical/conventional for that backend, i.e. a glium draw function that took a list of primitives might look significantly different from a gfx one, which might look significantly different to one that rendered to SVG or something. Ideally these backend modules wouldn't be necessary if we can enable some more efficiency in piston-graphics.

I can see a wide range of graphics features that conrod needs: different 2D shapes, colors, scissor/ blending modes, maybe some textures. Pretty close to "generic drawing API".

To clarify, in conrod I kind of get this list of drawing Primitives for free, as it's basically a (slightly altered) topological ordering of the widget DAG. I need the widget graph whether or not I use piston-graphics for rendering as I use it for much more than just representing graphics. I also use it for storing state and style for widgets (often custom ones) between updates, widget picking, finding the highest scroll-able container, determining visibility of widgets, keeping track of relative positioning uniquely over both axes and likely more I can't think of right now.

@bvssvni and I discussed abstracting conrod's color module into a general-use piston crate a long time ago. Iirc, we came to the conclusion that every crate in the ecosystem seems to have their own special set of use-cases for colors and that there wasn't really any point in trying to pack them all into one crate. It's usually pretty trivial to write a function that converts between two color types anyway.

We don't provide any support for multiple blend modes ourselves (just an alpha channel in Color). We don't try and abstract over or store any textures. Instead, when Ui::draw is called and the Primitives list is produced, the Image Primitive provides the widget::Id with which it was instantiated so that it can be mapped to some texture that the backend can draw.

My vision for piston-graphics would be to do something similar to conrod, though much simpler as it needn't worry about UI, where calling Rectangle::draw or Image::draw, etc doesn't immediately draw the primitive to screen, but rather builds one large tree representation. The Tree could later be drawn at once and would allow backends to more easily take advantage of batching, instancing or whatever. The tree would provide a method for iterating over it's primitives in order of depth, perhaps yielded as a variant of a drawing Command enum that also provides variants for changes to scissor, rotation and blend-mode. Each of the backends (gfx_graphics, glium_graphics, etc) would then provide a function that took this list of graphics::Commands along with the necessary parts specific to that backend and draw them to screen in an optimal manner, avoiding a need for wrapper types and traits.

I imagine the piston-graphics immediate API wouldn't have to change too much to support something like this, in the same way that conrod also has an "immediate API" over what is actually a fully retained graph that describes the whole GUI. This way all we'd have to do in conrod when draw is called is translate the primitive widgets into piston-graphics primitives, building the tree in a single loop. The users could then draw the piston-graphics Tree whenever they're ready, whether it needs to be sent to another thread, drawn after something else, etc.

@bvssvni @kvark thoughts?

bvssvni commented 7 years ago

My thoughts:

Ideas I had about UI frameworks that was built into the assumptions of piston-graphics and the piston core that today isn't accounted for in the design of Conrod:

Not saying that Conrod should follow these assumptions. It is just something that I originally thought of from my experience in UI programming. I can be wrong about these assumptions, or that they make too big trade-offs with other concerns.

Thoughts about the current direction of Conrod:

One idea I have is to prototype a retained UI framework on top of piston-graphics and see how it performs, and what challenges we would have by moving in that direction.

mitchmindtree commented 7 years ago

@bvssvni You've mentioned a lot of ideas here, a few that might be better in their own issues. I'll try to respond to those that are related to the conrod + piston-graphics discussion, but it would be best if you could open up separate issues about the other stuff (i.e. custom events, diversity in design (not sure what that means), transformed widgets, DSL). It would also be nice to get a more direct response to the ideas I mentioned in my previous comment if you get a chance, as I don't think they sacrifice any of the piston-graphics benefits you just mentioned.

piston-graphics architecture

  • I think the architecture of piston-graphics is the right one for flexible 2D, because it allows extending it with more shapes, using an interface that is easy to support across many backends

In my previous comment I don't mean to suggest re-architecting all of piston-graphics to be like conrod, or that there need to be any major changes to the API. My suggestion is that rather than going directly from

Rectangle::draw -> screen

we do

Rectangle::draw -> intermediate representation (tree of graphics primitives) -> screen

which could allow for some more flexibility for the reasons I mentioned in my previous comment.

  • I want a top-to-bottom coordinate system like in GDI+ ...

Just to clarify, my suggestion would not require this to change at all.

  • A software rasterizer backend for piston-graphics will solve the problems with mostly static UIs

This sounds like it could be interesting, however the main issue I'm personally having with piston-graphics' performance is with dynamic GUIs (that require updating every frame) so I'm not sure how much this would help in my case.

Performance

The performance should be good enough, since I have tested this a lot. Perhaps some improvement for textured rendering, but overall I don't think this should be a problem since GUIs are traditionally using CPUs. A little help from GPUs should be good, but I don't think you need the whole thing running in a shader to make it fast enough.

My generative music framework (my primary project upon which I work every day) has a significantly detailed, highly dynamic GUI. Here is a picture so that you may get an idea of the complexity:

screen shot 2016-09-08 at 2 53 20 pm

Everything you see here is conrod, rendering via piston-graphics in a default piston_window loop (gfx_graphics + glutin_window).

I remember discussing this with you and tomaka on IRC some months ago, and I also raised this in a previous conrod issue, but I will do so again here. GL calls have been showing themselves as the clear bottle-neck in my profiling, hands-down, for a long time. For about 6 months from the beginning of this year my GUI was so slow that it was actually unusable, running at about 5 FPS. After a lot of time spent trying to reduce the number of piston-graphics draw calls from the point at which conrod interfaces with piston-graphics (mainly by culling widgets that are hidden off-screen, behind other widgets, or out of range from the current scizzor from the render::Primitives list), I've managed to get it to a point where it runs at a barely usable state (~30fps using ~90% of the main thread cpu). Note that, depending on the number of notes generated across different instruments in a phrase and the number of changes in parameter automation, there can be anywhere from 500 - 2000 shapes visible on the screen at any point in time. Here is a screenshot of my most recent attempts (~2 weeks ago) to profile the program (built with --release) using OS X Instruments:

screen shot 2016-10-04 at 12 41 50 am

The top timeline shows shows the CPU Usage of the program running over about a minute. For the first two thirds of the duration I'm running the generative music system and thus the playhead is iterating forwards and requires re-drawing the GUI each frame. For the last third (where the CPU usage drops to about 20%) I had the system paused and so no drawing was required. I have selected part of the dynamic range so that we can analyse the performance of this section below. In the section below the timeline, you'll notice I've highlighted 4 processes. These are the processes in which most time is spent on each thread.

  1. The top-most thread is that which generates the music composition (uses very little compared to the other threads)
  2. The second thread is where I instantiate my conrod Ui, Widgets and produce the render::Primitives list which gets sent to the main thread for rendering any time the GUI requires being re-drawn. This is where the whole widget graph is built and maintained. Note that no actual drawing is done here (it's done on the main thread). About 16% of the program time is spent here.
  3. The main thread, where the vast majority of time is spent in gfx_device_gl::Device::submit - 58%. This is what the call looks like expanded with system library calls un-hidden (which include GL calls):

    screen shot 2016-10-04 at 1 20 02 am

    Here we can see that all but ~6% of this call is spent in GL calls.

  4. The audio thread, where songs are synthesized and rendered to my speakers in real-time. 3.6%.

Note that these percentages are in relation to the running time of the program - the actual program sits at about 100-140% CPU in Activity Monitor (the sum of cpu usage across all cores).

This program is very much still a WIP, with many more GUI features and details still to come, despite already reaching a barely usable state of performance. This kind of performance also means that I can't test my GUI without having my macbook pro plugged into a charger for more than 30 minutes, as the poor performance rinses my battery. I imagine that this would be a much more significant issue for users trying to distribute a 2D game, especially for mobile games where energy efficiency is very important.

I hope this helps to clarify why I'm searching for a way to improve upon piston-graphics' performance, whether it be in piston-graphics, its backends or new conrod-specific backends. It's not just because I think it would be nice if conrod GUIs would draw a bit faster - it's because as far as I can tell I need conrod's GUIs to draw significantly faster for the software that I'm trying to grow a business from. Please let me know if you think something is flawed in my profiling, tests or conclusions - I'm doing my best with very limited experience in both profiling and graphics. If my use-case is simply out-of-scope of what piston-graphics was intended for then let me know and I'll go back to trying to achieve this with conrod-specific backends.

Custom rendering

Ideas I had about UI frameworks that was built into the assumptions of piston-graphics and the piston core that today isn't accounted for in the design of Conrod:

  • Custom control over rendering in widgets (like in WinForms by Microsoft)

I'm not familiar with WinForms, would you mind providing an example? Conrod does allow for custom rendering of primitives (that's why we changed Ui::draw to return the render::Primitives type) and also allows for rendering specific widgets uniquely (any non-primitive widgets that have no children widgets are yielded by the render::Primitives iterator under the Primitive::Other variant so that they can be drawn by the user).

Thoughts about the current direction of Conrod:

  • Requesting that people share more data about the performance

I often receive issues about conrod CPU usage, but you're right that it would be nice to get more specific data. Unfortunately I can barely get contributions to docs let alone detailed profiling 😸

  • What is the overhead by using a graph for all the widgets? Do we have any data on this?

The best I can give you right now is the data in the performance profiling above - the conrod thread is the second thread listed. The graph at that point contains between 2000-3000 widgets, and despite only drawing about half of them (as many are scrolled off-screen, and only primitives are ever drawn) it seems to me that the GL calls still outweigh maintenance of the widget graph by almost 4 times (16% vs 58%).

bvssvni commented 7 years ago

@mitchmindtree Thanks! This is very helpful. I'll open up an issue about improving the performance. Sorry about being picky about the data, I know the performance can be improved but it's easier when there is a specific use case to test for. Could make an example with lots of widgets to push the limits.

To clarify the point about custom rendering:

primitives vs traits = finite sets vs infinite sets

For example, deformed grid, although not the most useful for widgets, is a kind of rendering case that isn't efficiently reducible to Conrod's primitives. There is always another higher level primitive that you can invent that does not factor very well into existing primitives, just like new letter can be used to create new words. Triangulation is the lowest level that all primitives factorizes to. Therefore, exposing the lowest level to widgets allows the largest set. This is what I mean about custom rendering.

WinForms allows you to handle a render event for a control, using a graphics object as parameter. It is similar to &mut G where G: Graphics.

mitchmindtree commented 7 years ago

No problem! You're right, it'd be a good to make some examples aimed at demonstrating the limits of performance - something dynamic with lots of widgets.

primitives vs traits = finite sets vs infinite sets

This doesn't have to be the case for piston-graphics.

For example, in conrod we could do an infinite set approach using primitives by having Primitives be types that wrap Box<Widget::State> and Box<Widget::Style> (the data that's stored within the graph), and provide a method on the Primitive that allowed for doing something like:

if let Some(text) = primitive.widget::<Text>() {
    // draw text
} else if let Some(image) = primitive.widget::<Image>() {
    // draw image
} etc

We don't currently do this in conrod because the majority of cases are covered by the small set, and having a small finite set as an enum (with the compiler enforcing all branches are handled) makes it very clear to other users what they need to implement for their own custom backends to get all of the widgets that come with conrod working. As I mentioned in my previous comment, we do also provide a Primitive::Other variant which provides a type that can be cast to custom widget types (using a method very similar to what I mentioned above) for the odd case where users need to handle some custom widget (i.e. a deformed grid, video playback, or something else). I'm unsure whether we'll switch to the purely infinite-set approach or not eventually, but it could be a good idea as the Other variant does seem like a messy way of remaining open-ended.

So to clarify, I imagine a simplified version of a piston-graphics' render primitives loop could look something like this:

// Instantiate some graphics.
Rectangle::new(color).draw(rect, &draw_state, transform, &mut tree);

// When the user is ready to draw the graphics to screen.
//
// `primitives` borrows data from the `tree` for efficiency, but could be converted to an
// owned version with a method along the lines of `primitives.to_owned()` if it is
// necessary (i.e. for sending across threads, etc).
let mut primitives = tree.draw();

// We then pass the primitives list to a backend (i.e. gfx_graphics,
// glium_graphics, etc) function for drawing.
while let Some(primitive) = primitives.next() {

    if let Some(text) = primitive.draw::<Text>() {
        // draw text (or cache to a texture, etc - whatever is most efficient for the backend)
    }

    else if let Some(rectangle) = primitive.draw::<Rectangle>() {
        // draw rectangle (or pack vertices or indices to some graphics buffer - whatever is most efficient for the backend)
    }

    else if let Some(image) = primitive.draw::<Image>() {
        // draw image
    }

    else if let Some(grid) = primitive.draw::<DeformedGrid>() {
        // draw grid
    }

    // etc
}

This leaves out scissor, blend-modes, rotation, etc for simplicity. I imagine a more complete loop that included these might look something like:

let mut primitives = graphics_tree.draw();
while let Some(Primitive { kind, radians, scissor, blend }) = primitives.next() {
    if let Some(text) = kind.draw::<Text>() {
        // draw text
    }
    // etc
}

This is just a rough sketch, but hopefully it clarifies what I mean a little. Let me know your thoughts @bvssvni, I might get a chance to put together a draft later today or tomorrow if you're interested.

Boscop commented 7 years ago

I also have issues with performance. I made a custom widget to display midi songs, and they make the GUI so slow that it's unusable, it takes ~30-50 seconds to react to any events in release build. Btw, this is how it looks like: https://i.imgur.com/IbTi3vo.png I draw a rectangle for each note. The IDs are generated like this:

if state.ids.notes.len() < data.cache.notes.len() {
    state.update(|state| state.ids.notes.resize(data.cache.notes.len(), &mut ui.widget_id_generator()));
}

I printed the number of IDs and midi notes: 11599 notes in one, 6640 in the other. 11599+6640=18239 18482-18239=243 non-note widget IDs

The widget still isn't finished, I still want to draw a grid in the background, like in my glium gui. This will add more than 100 more IDs, especially if I also render rectangles for the piano colored background layer. But with performance as it is right now, it's way too slow... :( Btw, the way I rendered the song view in glium is, one shader that draws the grid, and one shader for drawing rectangles, then batch all the rectangles into one vertex buffer. It's very fast. But I would prefer to use conrod.

Maybe instead of giving every primitive an ID and checking if it has to be redrawn, it would make sense to allow the client code to group all notes into one parent ID or something. So not every note id has to be checked if it should be redrawn.. But I'm not sure where the actual bottleneck is here.

mitchmindtree commented 7 years ago

Yeah it looks like your doing the ID generation fine, so I don't think the issue is there.

Rendering them using your shader might be the best bet until we can work out how to enable shader optimisations for drawing, as from my experience, there's a good chance 18000+ GL calls might be your bottle-neck. However, I'm no graphics expert and like I mentioned over e-mail, you'll have to actually use a profiler to know for sure where the bottle neck is.

That said, you still have a lot of music notes there! The widget_graph itself may still struggle to maintain such a large number of widgets even if you can get past any potential graphics bottlenecks. It tracks a lot of information per-widget - a lot of which probably isn't necessary for each individual note. I like your idea of providing a widget type that allows for instantiating one big list of graphics primitives that doesn't actually require an individual widget slot for each one. I've opened up an issue for this here.