JoelOtter / termloop

Terminal-based game engine for Go, built on top of Termbox
Other
1.42k stars 83 forks source link

Skip draw when nothing has changed #19

Closed aquilax closed 8 years ago

aquilax commented 8 years ago

Since drawing to the canvas looks to be the bottleneck, this change keeps the last canvas to the Screen structure and checks if the canvas has changed between the draws.

This is rather crude approach but the performance could be is significant, depending on the usage:

from:

(pprof) top10
55s of 57.30s total (95.99%)
Dropped 126 nodes (cum <= 0.29s)
Showing top 10 nodes out of 29 (cum >= 0.79s)
      flat  flat%   sum%        cum   cum%
    48.07s 83.89% 83.89%     48.07s 83.89%  github.com/mattn/go-runewidth.(*Condition).RuneWidth
     2.36s  4.12% 88.01%     51.24s 89.42%  github.com/nsf/termbox-go.Flush
     1.75s  3.05% 91.06%     55.38s 96.65%  github.com/JoelOtter/termloop.(*Screen).Draw
     0.74s  1.29% 92.36%     48.81s 85.18%  github.com/mattn/go-runewidth.RuneWidth
     0.61s  1.06% 93.42%      0.61s  1.06%  github.com/JoelOtter/termloop.(*Screen).RenderCell
     0.55s  0.96% 94.38%      0.55s  0.96%  github.com/JoelOtter/termloop.(*BaseLevel).DrawBackground
     0.45s  0.79% 95.17%      0.45s  0.79%  runtime.futex
     0.24s  0.42% 95.58%      0.75s  1.31%  github.com/JoelOtter/termloop.(*Rectangle).Draw
     0.13s  0.23% 95.81%      0.87s  1.52%  runtime.makeslice
     0.10s  0.17% 95.99%      0.79s  1.38%  runtime.newarray

to:

(pprof) top10
8090ms of 11500ms total (70.35%)
Dropped 87 nodes (cum <= 57.50ms)
Showing top 10 nodes out of 95 (cum >= 370ms)
      flat  flat%   sum%        cum   cum%
    1910ms 16.61% 16.61%     1910ms 16.61%  github.com/aquilax/termloop.(*Canvas).equals
    1480ms 12.87% 29.48%     1480ms 12.87%  github.com/mattn/go-runewidth.(*Condition).RuneWidth
    1460ms 12.70% 42.17%     1460ms 12.70%  github.com/aquilax/termloop.(*Screen).RenderCell
     860ms  7.48% 49.65%      860ms  7.48%  github.com/aquilax/termloop.(*BaseLevel).DrawBackground
     670ms  5.83% 55.48%     1730ms 15.04%  github.com/aquilax/termloop.(*Rectangle).Draw
     450ms  3.91% 59.39%      450ms  3.91%  runtime.futex
     410ms  3.57% 62.96%      410ms  3.57%  runtime.memclr
     380ms  3.30% 66.26%      450ms  3.91%  runtime.scanblock
     260ms  2.26% 68.52%     1310ms 11.39%  runtime.mallocgc
     210ms  1.83% 70.35%      370ms  3.22%  runtime.scanobject

NOTE: The running times are different so, better look at the percentages.

A better approach could be setting a dirty flag when something changes. That would eliminate the need of looping through the canvas.

JoelOtter commented 8 years ago

Thanks for looking into this. Code looks good but I have some concerns about comparing the two canvasses on every draw - I think you're right in that using a dirty bit will be much more efficient! I'll have a look into it.

JoelOtter commented 8 years ago

As discussed on Gitter, a dirty bit isn't going to work easily, because the Draws happen independently of each other so there's no way to know if, when rendering a cell, that's the last time it will be rendered to. Because we always render cells on top of cells, even if they're empty or background, there's no way to know if the dirty bit should be set - in effect, it always is.

With this in mind, I'm going to go ahead and merge this for now. I think it's unlikely we'll have a situation where the screen is different on every Draw (as that would involve some insanely fast-moving objects), so the extra pass over the canvas is unlikely to degrade performance overall. Thanks @aquilax!

JoelOtter commented 8 years ago

Tagging issue #18