MarlinFirmware / U8glib-HAL

Customized U8glib for use in Marlin 2.0
Other
45 stars 33 forks source link

Experimental full buffer in SSD1309 #31

Closed dbuezas closed 7 months ago

dbuezas commented 8 months ago

This draft PR is a working proof of concept that improves screen refresh rate. The impact is most relevant when combined with the recent PR that adds SW I2C support #26433 to Marlin.

Working principle

This is a hack

This is a draft proof of concept because the correct way would be to handle the full buffer directly:

Update: Not a hack in the end, this is also the way u8glib handles double page buffers in e.g SSD1306

Video comparisons

Make sure to enable sound, the audio feedback I added on each line change makes the difference easier to appreciate.

Original

https://github.com/MarlinFirmware/U8glib-HAL/assets/777196/cc458db1-6388-437a-a969-d9dd0f5a7a87

This PR:

https://github.com/MarlinFirmware/U8glib-HAL/assets/777196/77cbb03d-9ecd-45d6-bcaf-ca74fb5b08b1



Addendum

Max clock speed

https://github.com/MarlinFirmware/U8glib-HAL/assets/777196/f47a1dd0-f25f-4ded-924c-5000a31b5f95

No delays

https://github.com/MarlinFirmware/U8glib-HAL/assets/777196/ff73f1a4-739e-40fd-9173-bbd1c5ee7ea4

ALL TOGETHER

https://github.com/MarlinFirmware/U8glib-HAL/assets/777196/ccca0e6b-78d8-4409-a0bf-af40c5d41bfd


Test setup

dbuezas commented 8 months ago

This is the OPTIONAL modified SoftWire.cpp to achieve 260kHz clock with software i2c: https://gist.github.com/dbuezas/e4a7b8afddcee868a09bf014034fc37d

The patch in this experimental PR is not specific to SW i2c, but it is where it is most needed.

thinkyhead commented 8 months ago

Seems like a fine option when the board has enough SRAM. The fewer stripes we need to draw, the better. We already avoid drawing elements that only exist within a single stripe more than once, but overlapping items do get drawn two or more times, with clipping. The most important consideration is that each call to the display update handler should take as little time as possible before returning so that the main loop gets processed and the planner buffer remains as full as possible. To the extent that this shortens the time spent in the display update routine it's a winner.

dbuezas commented 8 months ago

If the screen was a single page, then even the double drawing could be avoided. I'll try this a bit more, but I think u8glib expects a page column to fit in one byte. As you note, this as is should already save lots of cpu time waiting for the i2c bus.

By the way, if you merge this, the we should add a flag in Marlin to switch between normal and full page modes.

thinkyhead commented 8 months ago

This change will make a Melzi with SSD1309 a better choice, since some of those boards have 16K of SRAM, most of which ends up unused. (The "compact bootscreen" option may eat up 1K of that.)

Accumulates all pages in a new buffer (8kB of RAM)

Is this a B/W screen with 128x64? That should only be 128 x 64 ÷ 8 == 1KB. Or is this screen using one byte per pixel even though it's only displaying one color? If that is the case, I wonder if the display has some special one-bit color mode for situations like this?

dbuezas commented 8 months ago

Oh, you are right it is 1kB, I counted each bit as a byte.

Note that this doesn't only not send unchanged pages, but it also sends all changed pages at once. This makes the screen a lot faster even when the amount of data transferred is the same. My hypothesis is that this is because the screen doesn't re-paint after each page, which may also be making the mcu wait between paints.

dbuezas commented 8 months ago

We could also get silly and send only the changed columns within each page. I took a look at the ssd1309 datasheet, and started to keep track of the first and last changed bit of each page.

Sending this to the lcd works fine:

u8g_WriteByte(u8g, dev, 0x000 | (start & 0b1111)); /* col start low */
u8g_WriteByte(u8g, dev, 0x010 | (start >> 4)); /* col start high */
[...]
u8g_WriteSequence(u8g, dev, end-start, full_buffer[page]+start);
dbuezas commented 8 months ago

It's even faster. Should I update this PR? It uses another 17 bytes:

thinkyhead commented 8 months ago

Should I update this PR?

Sure, go ahead. It seems like a pretty decent extra optimization for most cases. It could slow things down a little in cases where most of the screen is changed, but not enough to negate the other optimizations.

dbuezas commented 8 months ago

Here's a test visualisation. The colors are inverted in all reused parts of the screen.

https://github.com/MarlinFirmware/U8glib-HAL/assets/777196/6f61276f-8e69-4895-a4ba-63918e46da5b

Note: The test is actually slow because I'm rendering the screen twice, it just shows proves everything works as intended.

dbuezas commented 8 months ago

To test try it, change add _F in marlinui.DOGM

// #define U8G_CLASS U8GLIB_SSD1309_128X64
   #define U8G_CLASS U8GLIB_SSD1309_128X64_F
dbuezas commented 8 months ago

Sorry I just accidentally forced push on top of your changes, I'll fix that

update: fixed that

thinkyhead commented 8 months ago

I've made the minor adjustment to cache the number of columns instead of the end column + 1.

So the only thing to do now is to run some print tests to make sure this can't lead to planner starvation if, for example, the G-code arriving on the USB/Serial port contains many small segments and the screen is being frequently updated by navigating menus, scrolling, and generally causing the contents of the LCD to change while the print is underway. A "rocket vase" sliced without G2/G3 and with small segments (e.g., 0.25mm) should provide the kind of conditions that contend with LCD updates.

dbuezas commented 8 months ago

Good idea storing width instead of end.

Can you do that test? With software i2c and my beefy skr3 I don't think my setup would mean much. Plus I'm fairly new to this so I'm not quite sure I can confidently say if there is starvation or not.

My bet is that this PR makes it harder to happen, since there's less total transmission, and no waiting for the screen to finish partial updates anymore. Btw, my running assumption is that full buffer updates are faster (even when sending all pages and columns) because without it the lcd takes time to render each page independently and makes the mcu wait after each page, do you know if that's true?

dbuezas commented 8 months ago

Let me know if there's something I can help with

dbuezas commented 7 months ago

I lowered LCD_UPDATE_INTERVAL from 100 to 10 to make an experiment and OH BOY, it feels SO responsive! This is 100fps, which is an overkill, but going from 10 to 30 sounds reasonable to me and it makes quite a difference in how the UI feels.

I also noted no issues during prints, I'll make a small PR to marlin to set LCD_UPDATE_INTERVAL in configuration_adv.h

This is software I2C with all the modifications mentioned in this PR.

https://github.com/MarlinFirmware/U8glib-HAL/assets/777196/f342f559-ede3-4bde-9732-d2180ac75abb

thinkyhead commented 7 months ago

Thanks for the extra feedback and trying out the different refresh rate. I've been busy chasing down other things and preparing patches for older Marlin branches to keep up with platform changes.

It's a good idea to get that balance of refresh and responsiveness. Although there is a single call to the LCD update routine at high frequency, it's mostly to check the encoder, and will only update the display if the flag has been set to indicate that there's something new to draw. So as long as we call the routine frequently enough to get reliable encoder feedback we're golden. Maybe in the future we'll further decouple the encoder from the display so we can check it at an independent frequency, and we'll see if it helps with responsiveness.

If this is reasonably stable by your estimation then I'm happy to go ahead and merge it forthwith.

dbuezas commented 7 months ago

TL; DR it is stable, let's merge!

Oh yes, I know, your work is very visible :)

The modifications in this PR have been rock solid for me. On top of that, it is only active if the _F variant is selected, so it shouldn't affect anyone yet.

Once merged and released I'll add a flag to marlin to select it (or we could make it default if we're confident it is always better than by page)

Regarding encoder updates, I saw that there is a call to ui.update_buttons in temperature.cpp, so it seems to me like encoders are read at a faster rate (500hz). I actually changed it to run at 1khz as I have a snappy board.

I do still get some encoder jumps every so often, I assume due missed steps since they aren't read by pin change interrupts. TBH, the encoder felt more precise in Marlin 1.x on an atmega, is there any specific reason for them to be read at an interval instead of using pin change interrupts?

Thank you and congrats for your work in Marlin, I get a bit lost with the macros, but it is an amazing code base.

dbuezas commented 7 months ago

@thinkyhead in case this was forgotten, it is ready to be merged

thinkyhead commented 7 months ago

Thanks for the heads-up! I'll get to that shortly.

thisiskeithb commented 7 months ago

This breaks compiling for several LCDs (issue with Mini 12864 was reported on Discord):