dankamongmen / notcurses

blingful character graphics/TUI library. definitely not curses.
https://nick-black.com/dankwiki/index.php/Notcurses
Other
3.57k stars 112 forks source link

Multithreaded render #155

Closed dankamongmen closed 4 years ago

dankamongmen commented 4 years ago

We could start writing the buffer pretty much immediately if we were to have a second thread picking up at explicit fflush() points, ideally following BUFSIZ-size chunkouts. This ought only be done if there are at least two schedulable units -- the primary render process is entirely computation- and memory-access-bound (so hypercores are fine).

dankamongmen commented 4 years ago

perf indicates that dig_visual_cell() dominates our runtime, so this could actually be quite useful, especially once O(1) damage maps come into play.

dankamongmen commented 4 years ago

With O(1) damage having been merged last evening, perf indicates that the vast majority of our time is now being spent in the rectilinear sweep of dig_visible_cell(). This is embarassingly parallel, and throwing another thread at it could probably cut our rendering latency down by anywhere from ~20% to ~40%, I'd think. Throw two at the render, one at the top and one at the middle, transition the blocking_write() from a single buffer to a struct iovec scatter-gather, gate initiation of the writev() on the first render completing...HOLY SHIT, we can relax that false constraint, too -- if the bottom render finished significantly prior to the top render, just throw a cursor move in that sumbitch, boom motherfuckers! w00000t this seems very promising indeed, and probably especially useful on low-frequency multicores (think Raspberry Pi and similar ARM pootwahcores).

dankamongmen commented 4 years ago

I've broken up rasterizing and rendering, and we actually appear to have picked up a small win from it, perhaps due to better use of cache, not sure. But excellent. I can now toss in a thread and split up at least the rendering step. As noted above, an iovec would then provide an easy path to parallel rasterizing.

dankamongmen commented 4 years ago

Keep a stat on thread-assisted renders if we do this.

dankamongmen commented 4 years ago

I got a version working in the dankamongmen/threaded-render branch. Doing so required--something I didn't realize initially--locking nc->pool to work properly. Once this (contested) lock was added, the optimal case of the threaded version--running with no delay, with threads--saw a 12s runtime grow to 17s. Ugh. This was a 104x78 geometry.

I enlarged the geometry to a full screen (382x78), and 43s went to 70s.

We'd need eliminate the lock on nc->pool to possibly get a win off this, and I just don't care to put more time into this until we're actually slow enough to not exceed all know frame rates by a factor of 10. Sigh.