NiLuJe / FBInk

FrameBuffer eInker, a small tool & library to print text & images to an eInk Linux framebuffer
https://www.mobileread.com/forums/showthread.php?t=299110
GNU General Public License v3.0
332 stars 23 forks source link

Some minor performance improvements #35

Closed NiLuJe closed 5 years ago

NiLuJe commented 5 years ago

This initially started as an experiment in replacing the FBInkColor struct (which was basically just a triplet of uint8_t for R, G, B) with a magical union (FBInkPixel) that can hold a pixel representation, properly packed for our target bitdepths (i.e., g8, rgb565, bgra). My train of thought being that, on non-trivial bitdepths (i.e., >= 16bpp), we were storing/using pen-colors as a single g8 value, and constantly repacking it once per pixel.

It turned out that the overhead of doing that wasn't as expensive as I would have thought, so, that bit of the PR actually has a minimal impact on performance ;p. It does make most things slightly more sensible to read (except maybe button_scan), so, I'll take that ;p.

What did noticeably affect performance is switching the scaling in the fixed-cell rendering from a naive rectangle (well, square) per-pixel plotting to using fill_rect instead, because, duh. And since fill_rect is (usually) using memset, zoom! This uncovered a sneaky issue with fill_rect @ 16bpp, because nothing except black & white actually packs into two identical bytes @ RGB565. So, fixed that by switching to a naive pixel plotting routine if needed.

What also very, very, very moderately affected performance is switching from the Get/Put Pixel function pointer to an if ladder in get/put_pixel. Probably because of the CPU's branch predictor. That's not the first time I did this experiment, except I usually tried it with a switch: turns out that was noticeably worse, here.

NiLuJe commented 5 years ago

As expected, this has almost no impact on 8bpp, because you can't beat that for simplicity ;).

The main aim was 32bpp, but it turns out it noticeably helped 4bpp fbs, for some mysterious reason. 16bpp remains terrible, because RGB565, but it does help a bit there, too.

NiLuJe commented 5 years ago

Also uncovered some more 4bpp corner-cases that I won't bother to fix, because dealing with nibbles gives me headaches -_-".

(In short: using an odd scaling factor in the fixed-cell rendering codepath will lead to artefacting on the edges, especially in overlay mode).

(There are workarounds in place to deal with the same kind of issues in the OT codepath, as well as in dump/restore. AFAICT, this would be trickier to do here.).