dividuum / info-beamer

The Multimedia Presenter for Lua (for commercial projects, use info-beamer pi instead)
https://info-beamer.com/
Other
227 stars 48 forks source link

use core profile OpenGL #29

Open vliaskov opened 9 years ago

vliaskov commented 9 years ago

This is more a of a question, not an actual issue / bug report:

I 've noticed immediate mode /compatibility profile is used in the code (e.g. glVertex in image.c, video.c rendering). Can these be replaced with core profile OpenGL e.g. VBOs? Do you expect a performance increase if this is attempted i.e. would a version using non-compatibility OpenGL 3.0 offer better performance than the current github version (for PC hardware)?

Also: In general it would be useful for volunteers to have a TODO file for possible performance optimizations (such as this question) and for new features.

dividuum commented 9 years ago

I'm not sure if VBOs would give a noticeable performance boost for images and videos. The Lua code that controls the output can be very dynamic and unpredictable: In one frame it draws something, in the next frame it doesn't. So caching VBOs between frames might be difficult. Assuming VBOs only get using for drawing a single frame you run into the problem that it gets tricky when multiple textures are used. I guess it would complicate the code a lot for little benefit. But I'd love to be proven wrong :-)

One thing that is optimizable is font rendering. On the PI, then font engine creates a font atlas and uses VBOs. It caches rendered strings so in the optimal case drawing a string is just a single call to glDrawArrays. That change made a difference. You can read more about it in a blog post I wrote.

I'm curious what kind of visualization you're building where these change might make a difference. Before using the PI I did some projects using the PC version and never ran into performance problems.

Regarding features: I'm always open to suggestions. One thing I want to avoid is feature creep inside info-beamer for things that might as well be done using an outside tool. So there's (for example) never going to be an embedded HTTP client.

vliaskov commented 9 years ago

Sorry for the delay. Thanks for the feedback and for the blog post about performance, it's a very useful read.

Perhaps freetype-gl https://github.com/rougier/freetype-gl can be used instead of ftgl so that we can create an atlas and use VBOs . Did you implement your own caching on a per string/text basis, or is there some glyph cache provided in a library you use for font rendering in the pi version?

I also tried the latest font rendering commit edf6d83c (using texturefonts instead of polygonfonts). There is a difference in CPU load (much less load with texturefonts) Fps is the same, about ~60fps on high-end PCs (i.e. framerate was good even before the change, but maybe that's because I am testing simple nodes with not much text). By the way do you measure fps from lua and/or lua callbacks or in another way ? E.g. I 've added code to font_write for font framerate calculations but maybe info-beamer has a generic way of measurement?

Some of the x86_64 systems I will be testing on have mobile graphics with no dedicated GPU memory, so perhaps optimizing GPU transfers would make a difference.

If the other 2 optimizations in the performance blog post are relevant to the github PC version (optimizing MVP matrix and rendering directly to screen instead of an intermediate texture), let me know.

dividuum commented 9 years ago

I just took a look at freetype-gl. It sounds very interesting and looks solid. The problem with those nice libraries always seems to be that they are not readily available in major distributions. So the user of info-beamer has to install those dependencies by hand. Which I personally find annoying and thus ended up in the project philosophy: https://info-beamer.com/doc/info-beamer#projectphilosophy. So I have mixed feeling about this one. But it would probably give a big boost if added to info-beamer.

The PI version uses a dynamically growing texture atlas that is resized/purged when it gets too full. VBOs are FIFO cached on a per-string basis. Fonts that are used multiple times share the same cache/atlas.

FPS is measured by just counting the number of calls to node.render (on the C-side) that is then divided by the total time between stats output. Which defaults to 10 seconds, unless the SPACE key is pressed. I guess it's the easiest way to do that.

Minimizing GPU transfers sounds like a good idea. Getting rid of the top-level texture and directly rendering to the screen might make a difference on lower performance devices. The PI made this change a bit easier, since its GL surface can be scaled up with the help of a hardware video scaler (see https://info-beamer.com/blog/raspberry-pi-hardware-video-scaler). So I ended up rendering to the requested resolution (the one set by gl.setup) and use then use the HVS to scale the result to the screen resolution. On a desktop system this might be trickier. But I might be wrong here.

The MVP matrix optimization isn't really necessary, since that's an OpenGL ES artifact. The fixed function pipeline that the desktop version uses handles matrices internally, so there isn't really anything to optimize.