PistonDevelopers / graphics

A library for 2D graphics, written in Rust, that works with multiple back-ends
MIT License
479 stars 55 forks source link

Draw calls performance bottleneck #1026

Open bvssvni opened 8 years ago

bvssvni commented 8 years ago

This issue is to help people understanding the picture of what causes a performance bottleneck in piston-graphics, and what the plan is to fix it.

For each draw call, the CPU need to send data to the GPU. The GPU is often very fast at rendering. If this capacity is not used fully, the GPU sits and waits for more input from the CPU.

GPUs are designed for handling massive amounts of data with a limited set of variation. What the GPU does is controlled through a shader language. For OpenGL the shader language is GLSL.

When you render a rectangle, this is what happens:

  1. Transformed triangles are created on the CPU in chunks and sent to the graphics backend.
  2. The backend writes the received data to dynamic buffers.
  3. The graphics driver tells the GPU to render using the updated buffers.
  4. The GPU renders using a precompiled shader and paint pixels in the frame buffer.
  5. The frame buffer is swapped with the current one to update the display

Step 1-4 happens repeatedly when drawing many objects for each frame.

In the Gfx backend the draw commands are collected upfront and given to the driver at the same time. However, from the graphics driver side, the instructions seems similar to the ones generated by the OpenGL backend (except for changes made the draw state).

The 1st step is done by piston-graphics's design. Reasons to triangulate on the CPU:

Some questions one might ask:

Before making changes to the design, one might consider using the strengths it offers to fix the problem. It seems the largest overhead is the number of draw calls, and since reducing the number of draw calls will lead to less overhead, we should looks for ways to do that first. This happens in the 2nd step, not the 1st!

Batch, batch, batch!

The key insight here is that since piston-graphics triangulates on the CPU, we could pack multiple shapes into the same buffer in the backend. This leads to fewer draw calls when:

One downside is that many backend instances leads to higher memory usage. Based on experience so far most applications only use one instance, so I do not think this is a problem.

For example, in Conrod a lot of solid colored shapes are rendered, then some textured shapes (text) and then more solid colored shapes etc. Currently the CharacterCache backends rasterizes glyphs using Freetype for each character in a separate texture. This means we can reduce the number of draw calls for solid shapes, but not for text.

In the case of text, we could try two different approaches:

  1. Pack glyphs in a single image and update a texture
  2. Since glyphs are often of similar size, consider using texture arrays

Number one seems sensible to test first because it would benefit from the same reduction of draw calls. However, it requires some changes:

By organizing graphic primitives into a tree structure, one can traverse it and optimize the draw calls.

While this would be very interesting to work on, there are some major obstacles/unknowns:

I believe this plan requires minimum effort and least amount of breaking changes. We keep the same overall design of piston-graphics and the existing benefits.

mitchmindtree commented 8 years ago

Sounds great :+1:

bvssvni commented 8 years ago

This also requires changing the shaders from using a uniform color to one color per vertex. Triangles from different shapes gets packed into the same buffer, so their color must be separated.

crumblingstatue commented 8 years ago

I absolutely love https://love2d.org/wiki/SpriteBatch. It would be nice to have a similar feature in Piston. It's kind of off-putting when your Rust game runs slower than Lua because of the drawing overhead.

bvssvni commented 8 years ago

@crumblingstatue Can you open a new issue about it? Thanks!

crumblingstatue commented 8 years ago

@crumblingstatue Can you open a new issue about it? Thanks!

Alright, I opened #1041.

ishitatsuyuki commented 7 years ago

I've found the text renderer horrible. The minimal overhead is about 23 calls/frame (rusttype's gpu_cache example).

However, Piston doesn't batch it at all, do many context switches like enabling and disabling scissors. This resulted in 1000 calls/frame (and due to the Text implementation, it can increase further with more characters).

This is 50x slowdown. Not really affordable.

bvssvni commented 7 years ago

@ishitatsuyuki Yeah, text rendering is really bad right now.

KongouDesu commented 6 years ago

What's the current state of this issue, especially in regard to text rendering?

bvssvni commented 6 years ago

Texture rendering is now significantly faster for the OpenGL backend, but the glyph cache implementation must be changed to take advantage of this optimization.