Roger-random / ESP_8_BIT_composite

Color composite video code from ESP_8_BIT as an Arduino library
MIT License
125 stars 15 forks source link

Add some way to see performance metrics #4

Closed Roger-random closed 3 years ago

Roger-random commented 3 years ago

The typical pattern is to call waitForFrame() before drawing, in order to minimize visible tearing as we draw the next frame. But as the drawing tasks get more complex, eventually they'll take too long and spill over into rendering time for the next frame.

When writing a sketch, right now there's no way to tell how close we are to this limit.

Enhancement: track a few numbers from a high resolution timing mechanism, so we can calculate how much processing time was consumed in the drawing routine as compared to time available between frame renders.

Precedent: the ESP_8_BIT project (from which this project was derived) had performance metric mechanisms using numbers from xthal_get_ccount(). But that didn't translate easily to something generalized yet conforms to Arduino API Style Guide. Thus it did not make the transition for v1.0.0, but the topic is worth revisiting.

Roger-random commented 3 years ago

Performance measurement has been implemented. It is a lightweight system that comes with caveats for how to interpret its numbers. Documented in the header comments for now, will also put someplace more readable later. (The Wiki of this project perhaps?)

Copy/pasted from ESP_8_BIT_GFX.h

/////////////////////////////////////////////////////////////////////////
//
//  Performance metric data
//
//  The Tensilica core in an ESP32 keeps a count of clock cycles read via
//  xthal_get_ccount(). This is only a 32-bit unsigned value. So when the
//  core is running at 240MHz we have just under 18 seconds before this
//  value overflows.
//
//  Rather than trying to make error-prone and expensive calculations to
//  account for clock count overflows, this performance tracking is
//  divided up into sessions. Every ~18 seconds the clock count overflow,
//  we start a new session. Performance data of gaps between sessions
//  are lost.
//
//  Each sessions retrieves from the underlying rendering class two pieces
//  of data: the number of frames rendered to screen and the number of
//  buffer swaps performed. These are uint32_t. When they overflow, the
//  frame count related statistics will be nonsensical for that session.
//  The values should make sense again for the following session.
//
//  Performance data is only gathered during waitForFrame(), which assumes
//  the application is calling waitForFrame() at high rate so we can
//  sample performance data. Applications that do not call waitForFrame()
//  frequently may experience large session gaps of lost data. If
//  waitForFrame() is not called for more than 18 seconds, the data will
//  be nonsensical. Fortunately applications that do not make frequent
//  frame updates are probably not concerned with performance data anyway.
//
//  Clock cycle count is a value kept by a core. They are not synchronized
//  across multiple ESP32 cores. Trying to calculate from cycle counts
//  from different cores will result in nonsensical data. This is usually
//  not a problem as the typical usage has Arduino runtime pinned to a
//  single core.
//
//  These metrics track the number of clocks we spend waiting, but that
//  includes both idle clock cycles and clock cycles consumed by other
//  code. Including our underlying rendering class! The percentage is
//  valid for relative comparisons. "Algorithm A leaves lower percentage
//  waiting than B, so B is faster" is a valid conclusion. However
//  inferring from absolute numbers are not valid. For example "We wait
//  50% of the time so we have enough power for twice the work" would be
//  wrong. Some of that 50% wait time is used by other code and not free
//  for use.
//
//  The tradeoff for the limitations above is that we have a very
//  lightweight performance tracker that imposes minimal overhead. But
//  take care interpreting its numbers!