emfcamp / badge-2024-software

46 stars 29 forks source link

Improve app rendering performance #156

Closed mbooth101 closed 3 months ago

mbooth101 commented 3 months ago

Description

Since firmware 1.7.0 I noticed a performance regression. I wrote a simple app that does nothing except show the framerate and the time-per-frame on screen. Compare badge Firmwares 1.6.0 versus 1.7.0 in the footage below.

As you can see from the video, I've experienced a drop from around 10 fps to 8fps, which as a percentage, is quite significant.

Whilst investigating this, one potentially very low hanging optimisation fruit I found would be to avoid rendering all the applications in the foreground stack. Since the first thing every app does at the start of the render loop is clear the framebuffer, the top-most app in the stack will overwrite everything rendered by all the other apps below it in the stack.

The change I'm proposing in this PR renders only the top-most app in the foreground apps stack. This completely eliminates the time spend rendering the launcher (and any other app that happens to be in the stack) and more or less halves the time taken for the end_frame step because the ctx drawlists are much shorter. No change is made to the rendering behaviour of the always on top stack of apps.

This yields a noticeable improvement in the framerate of tildagon apps with realtime graphics. You can see from the Firmware 1.7.0-Patched footage below that the test app now consistently achieves around ~14 fps.

Firmware 1.6.0

Video of the app running:

https://github.com/emfcamp/badge-2024-software/assets/597661/af60f584-b4cb-4052-bb02-2889086d781f

Here's a screenshot in case the video doesn't play in your browser

image

Firmware 1.7.0

Video of the app running:

https://github.com/emfcamp/badge-2024-software/assets/597661/95ca5a27-7670-4e0c-9b9c-e669440a8869

Here's a screenshot in case the video doesn't play in your browser

image

Firmware 1.7.0-Patched

Video of the app running:

https://github.com/emfcamp/badge-2024-software/assets/597661/9795e896-618c-41cb-9152-0991a6f689ed

Here's a screenshot in case the video doesn't play in your browser

image

Test App Source

For reference this is the source of the app I am using to show the framerate

import app
import asyncio
import ota
import time

from app_components import tokens

from system.eventbus import eventbus
from system.patterndisplay.events import *
from system.scheduler.events import *

# Firmware version
FW_VER = ota.get_version()

# Font sizes
PERF_FONT = 6 * tokens.one_pt

class TestApp(app.App):

    def __init__(self):
        # Performance metrics
        self.current_t = 0
        self.last_t = 0
        self.accumulated_t = 0
        self.sample_idx = 0
        self.frametime_samples = [0,0,0,0,0,0,0,0]
        self.frametime = 0
        self.framerate_samples = [0,0,0,0,0,0,0,0]
        self.framerate = 0

        eventbus.on_async(RequestForegroundPushEvent, self._resume, self)
        eventbus.on_async(RequestForegroundPopEvent, self._pause, self)
        eventbus.emit(PatternDisable())

    async def _resume(self, event: RequestForegroundPushEvent):
        # Disable firmware led pattern when foregrounded
        eventbus.emit(PatternDisable())

    async def _pause(self, event: RequestForegroundPopEvent):
        # Renable firmware led pattern when backgrounded
        eventbus.emit(PatternEnable())

    async def run(self, render_update):
        self.last_t = time.ticks_us()
        while True:
            # Calculate time since last frame
            self.current_t = time.ticks_us()
            delta_t = time.ticks_diff(self.current_t, self.last_t)
            self.accumulated_t = self.accumulated_t + delta_t
            self.last_t = self.current_t

            # Calculate some performance metrics
            self.frametime_samples[self.sample_idx] = delta_t
            self.framerate_samples[self.sample_idx] = 1_000_000 / delta_t
            self.sample_idx = (self.sample_idx + 1) % 8
            if self.accumulated_t > 250_000:
                self.accumulated_t = self.accumulated_t - 250_000
                self.frametime = int(sum(self.frametime_samples) / 8)
                self.framerate = sum(self.framerate_samples) / 8

            # Perform the update
            if self.update(delta_t) is not False:
                await render_update()
            else:
                await asyncio.sleep(0.05)

    def update(self, delta_t):
        pass

    def draw(self, ctx):
        ctx.text_align = ctx.CENTER
        ctx.font_size = PERF_FONT
        ctx.rgb(0,0,0).rectangle(-120,-120,240,240).fill().rgb(1, 1, 1)
        ctx.move_to(0, -80).text(f"{self.framerate:.2f} fps")
        ctx.move_to(0, -60).text(f"{self.frametime} us")
        ctx.move_to(0, -40).text(f"FW: {FW_VER}")

# Set the entrypoint for the app launcher
__app_export__ = TestApp
MatthewWilkes commented 3 months ago

Hi @mbooth101!

147 does this also, as well as reducing the duplication with the on top stack. Would you be happy with that implementation,or is there a benefit of this one? I'm happy to merge either.

MatthewWilkes commented 3 months ago

Thank you for the test app, also. This is really handy :)

mbooth101 commented 3 months ago

Hi, I didn't notice there was already a PR for this sorry for the noise!

Yes your implementation is fine -- I didn't want to mess to the on_top stack in case there was a reason it was separate. Feel free to merge your change and close this one :-)

MatthewWilkes commented 3 months ago

Will do. I've also applied a change to the framebuffer ram location that gets us another 18% or so.