adafruit / circuitpython

CircuitPython - a Python implementation for teaching coding with microcontrollers
https://circuitpython.org
Other
4.11k stars 1.22k forks source link

ILI9341 display glitches #4775

Open ladyada opened 3 years ago

ladyada commented 3 years ago

@kmatch98 & @FoamyGuy i think y'all did some hacking on dirty rectangle checking for circuitpy? we're seeing 'speckles' left over on the funhouse demo, it has a lot of text updates! https://forums.adafruit.com/viewtopic.php?f=4&t=179310

kmatch98 commented 3 years ago

I see this uses the portalbase library. I took a quick look at the add_text and set_text functions. It has its own functions for displaying text and doesn’t use the bitmap_label or label libraries. Maybe something in the custom add_text and set_text didn’t have all the updates from the bitmap_label and/or label libraries.

Link to portalbase set text https://github.com/adafruit/Adafruit_CircuitPython_PortalBase/blob/16f68f1987ecf72c4dccbdcf2289970c85c42f9a/adafruit_portalbase/__init__.py#L216

I’m away from my computer now but if this is still open can take more detailed look this afternoon.

FoamyGuy commented 3 years ago

Very strange behavior in the images.

It looks like portalbase is using Label internally in add_text / set_text.

I will be able to dig in a bit more this evening. Hoping to try to isolate the issue into a smaller example maybe without portalbase if possible.

kevinjwalters commented 3 years ago

Does CircuitPython have a screen dump feature now? Would that help a little to narrow down where the problem lies? Ah, I think this is what I was thinking of, it's actually a Bitmap dumper: https://circuitpython.readthedocs.io/projects/bitmapsaver/en/latest/api.html

kevinjwalters commented 3 years ago

If this is an issue with the SPI writes to the screen, there's another ESP32-S2 mystery in #3835 about glitches perhaps related to timing for NeoPixel protocol

kmatch98 commented 3 years ago

@foamyguy, thanks for the correction about using bitmap_label.


I modified the code to run on the PyPortal and am unable to replicate the issue.

As @kevinjwalters suggests, is it possibly an SPI writing issue? Looking at the image, it looks like text is being written in the locations where text is never requested (for example to the right of "Slider" or left of "SENSOR1").

->>>> If the SPI speed is adjustable, can someone with the FunHouse hardware try the example with a slower SPI baud rate?

From the forum post, here are locations where text should never be written:

image

I have a PyPortal for testing, running:

Adafruit CircuitPython 6.2.0 on 2021-04-05; Adafruit PyPortal with samd51j20

I ran the following code on the PyPortal and so far unable to replicate the issue. I hacked the FunHouse example to create random values for all sensor inputs.

https://user-images.githubusercontent.com/33587466/118713774-f3c47900-b7e7-11eb-9c98-350021ed621d.mov

Note: I had to hack the PyPortal library to add the scale property, similar to what is used in the FunHouse, here is the gist of my PyPortal library with scale added.

# SPDX-FileCopyrightText: 2017 Scott Shawcroft, written for Adafruit Industries
# SPDX-FileCopyrightText: Copyright (c) 2021 Melissa LeBlanc-Williams for Adafruit Industries
#
# SPDX-License-Identifier: Unlicense
import board
from digitalio import DigitalInOut, Direction, Pull
# from adafruit_funhouse import FunHouse
from adafruit_pyportal import PyPortal
import random

# funhouse = FunHouse(
#     default_bg=0x0F0F00,
#     scale=2,
# )

funhouse = PyPortal(
    default_bg=0x0F0F00,
    scale=2,
)

# funhouse.peripherals.set_dotstars(0x800000, 0x808000, 0x008000, 0x000080, 0x800080)

# sensor setup
sensors = []
# for p in (board.A0, board.A1, board.A2):
#     sensor = DigitalInOut(p)
#     sensor.direction = Direction.INPUT
#     sensor.pull = Pull.DOWN
#     sensors.append(sensor)

def set_label_color(conditional, index, on_color):
    if conditional:
        funhouse.set_text_color(on_color, index)
    else:
        funhouse.set_text_color(0x606060, index)

# Create the labels
funhouse.display.show(None)
slider_label = funhouse.add_text(
    text="Slider:", text_position=(50, 30), text_color=0x606060,
)
capright_label = funhouse.add_text(
    text="Touch", text_position=(85, 10), text_color=0x606060,
)
pir_label = funhouse.add_text(text="PIR", text_position=(60, 10), text_color=0x606060,
    )
capleft_label = funhouse.add_text(
    text="Touch", text_position=(25, 10), text_color=0x606060,
)
onoff_label = funhouse.add_text(text="OFF", text_position=(10, 25), text_color=0x606060,
    )
up_label = funhouse.add_text(text="UP", text_position=(10, 10), text_color=0x606060,
    )
sel_label = funhouse.add_text(text="SEL", text_position=(10, 60), text_color=0x606060,
    )
down_label = funhouse.add_text(
    text="DOWN", text_position=(10, 100), text_color=0x606060,
)
jst1_label = funhouse.add_text(
    text="SENSOR 1", text_position=(40, 80), text_color=0x606060,
)
jst2_label = funhouse.add_text(
    text="SENSOR 2", text_position=(40, 95), text_color=0x606060,
)
jst3_label = funhouse.add_text(
    text="SENSOR 3", text_position=(40, 110), text_color=0x606060,
)
temp_label = funhouse.add_text(
    text="Temp:", text_position=(50, 45), text_color=0xFF00FF,
)
pres_label = funhouse.add_text(
    text="Pres:", text_position=(50, 60), text_color=0xFF00FF,
)
funhouse.display.show(funhouse.splash)

while True:
    funhouse.set_text("Temp %0.1F" % random.uniform(0,50), temp_label)
    funhouse.set_text("Pres %d" % random.uniform(0,50), pres_label)

    # print(funhouse.peripherals.temperature, funhouse.peripherals.relative_humidity)
    set_label_color(random.choice([False, True]), onoff_label, 0x00FF00)
    set_label_color(random.choice([False, True]), capleft_label, 0x00FF00)
    set_label_color(random.choice([False, True]), capright_label, 0x00FF00)

    # slider = funhouse.peripherals.slider
    # if slider is not None:
    #     funhouse.peripherals.dotstars.brightness = slider
    #     funhouse.set_text("Slider: %1.1f" % slider, slider_label)
    set_label_color(random.choice([False, True]), slider_label, 0xFFFF00)

    set_label_color(random.choice([False, True]), up_label, 0xFF0000)
    set_label_color(random.choice([False, True]), sel_label, 0xFFFF00)
    set_label_color(random.choice([False, True]), down_label, 0x00FF00)

    set_label_color(random.choice([False, True]), pir_label, 0xFF0000)
    set_label_color(random.choice([False, True]), jst1_label, 0xFFFFFF)
    set_label_color(random.choice([False, True]), jst2_label, 0xFFFFFF)
    set_label_color(random.choice([False, True]), jst3_label, 0xFFFFFF)
ladyada commented 3 years ago

hmm @makermelissa can you reproduce the issue? maybe we need to slow down the TFT display?

gamblor21 commented 3 years ago

I have it fairly reliably glitching on my funhouse, I will try to replicate it and look at the SPI speed later today.

kevinjwalters commented 3 years ago

The default_bg=0x0F0F00 is useful in this case as it reveals the mostly-black pair of vertical lines on the left side. That's suspicious and adds weight to it not being just some minor errors in font rendering.

gamblor21 commented 3 years ago

Couple things I have discovered:

  1. Changed the SPI baud rate ranging from 30Mhz to 60Mhz - no effect

  2. I can see the vertical lines draw on the screen while the background is filling in (used 0x0F0F00 as a BG color). The glitch always seems to be a black line with bright white pixels in a few locations. As the display is rotated 270 by default this is really is a horizontal line on the physical display.

  3. Labels drawn after the background overwrite the "glitch".

Given this it seems to be more a rendering error or an error with a large write to the display.

IMG_5285

kmatch98 commented 3 years ago

I replicated the issue on an ESP32-S2 Saola with a generic ILI9341 display (320x240) running either CircuitPython v6.2 or 6.1.

I disconnected from my PC and ran with directly from a power adapter and same display glitch was observed.

Note: It takes a while for it to get severe, so you may need to wait several minutes for it to show up distinctly. While running, I observed that some of the redraws of the text are only partially refreshed, but then were overdrawn correctly when the next text update occurs.

I ran full screen refreshes with alternating solid color fills. So far, the full screen redraws look clean to me (but perhaps it is possible I am missing an incorrect redraw, since the next full redraw would cover over it).

Also, I see that the FunHouse uses an ST7789 display driver. I observe the same issue with this display with an ILI9341 display, suggesting that it may be a pure SPI signal issue. I'll speculate that if some (non-USB) housekeeping occurs during a redraw that somehow the timing of the SPI glitches. Is it possible that a redraw can get interrupted by other housekeeping chores?

gamblor21 commented 3 years ago

Update: I tried slowing the SPI bus down even further and noticed the rate of corruption lowered. At 5Mhz after running all night there are no problems on the screen.

The display seems to draw at the same speed as when the bus was set at 60Mhz but I haven't done any timings. At the moment I'm at a dead-end as to why this would fix the problem but it would be a potential short-term fix if required.

The line I changed to 5000000 is: https://github.com/adafruit/circuitpython/blob/8c0f35858399088f3b5a5100909644cea377de14/ports/esp32s2/boards/adafruit_funhouse/board.c#L74

kmatch98 commented 3 years ago

@gamblor21 good find.

I replicated this success with an ESP32-S2 Saola with an ILI9341 display, by changing to baudrate=5_000_000 in the displayio.FourWire setup:

        spi = busio.SPI(clock=board.IO1, MOSI=board.IO2, MISO=board.IO3)

        display_bus = displayio.FourWire(
                                    spi,
                                    command=board.IO4,
                                    chip_select=board.IO5,
                                    reset=board.IO6,
                                    baudrate=5_000_000,
                                    )

        self.display = adafruit_ili9341.ILI9341(display_bus,
                                    width=320,
                                    height=240,
                                    rotation=90)

It's been running about 10 minutes with no observed glitches.


Things that don't work: spi.configure

I tried setting the baudrate using the .configure baudrate option of busio, following this guide. It had absolutely no effect on the display errors.

On my Saola, it reports that the spi.frequency is default of 250,000. I changed the value using spi.configure(baudrate=100_000) and several other values. Reprinting spi.frequency indicates that the value was changed, but I saw no effect on display speed or anything. I suspect that it's not actually doing anything.

gamblor21 commented 3 years ago

I checked the Arduino ESP32S2 library and noticed the default SPI bus is set at 1Mhz. I didn't have time to do any extensive testing (there doesn't seem to be a public method to change the bus frequency in Arduino). But may explain why it seems normal on Arduino but not on CP.

kmatch98 commented 3 years ago

Based on the notes from @gamblor21 on the funhouse and my testing on the Saola, this suggests that changing the default SPI baud rate to 5 MHz or less will solve this issue for all ESP32-S2 boards.

Is there one file we can update to make the default SPI baud rate consistent across all the ESP32-S2 boards? If someone points me in the right direction I can make a PR.

kevinjwalters commented 3 years ago

@kmatch98 Is this a workaround or a fix? How fast do other non-ESP32-S2 boards send over SPI to the screen like Gizmo or CLUE?

kmatch98 commented 3 years ago

I suspect this is a workaround, but maybe a fix is warranted. I'm not well-versed enough to understand how to fix. If there is urgent timing for deploying a temporary solution, then maybe a workaround is a start?


In the code, the displayio.FourWire for the FunHouse defaults to 60 MHz. I don't see this in other boards (without displays), so I'm unsure what the default SPI baudrate value is.

Also, please note that the busio configure option doesn't seem to do anything whenever the displayio.FourWire is used. The busio .configure defaults to 250 kHz, but it seems to be overridden by the FourWire definition. It this a bug, a feature or a separate issue?


Here are a few cursory items I found in the ESP32-S2 notes related to SPI capabilities and settings (Note: If CircuitPython is actually using bitbanging, then ignore this.)

In my testing on the Saola, I observe glitches at >= 8 MHz (updated)

gamblor21 commented 3 years ago

When I was working on ParallelBus for RP2040 I ran into a speed problem as well, and could not find any limitation in the product sheets for the displays so I'm not really sure. I found the same thing @kmatch98 has that the ESP32S2 supports SPI at 80Mhz. And I'm at a loss (or missed something) as to why the M4 Hallowing seems okay with 60Mhz.

I do know the official ESP32S2 Arduino library has SPI default to 1 Mhz and that is what the displays will start as.

I would suggest submit a PR lowering it for the Funhouse for now at least, and maybe someone who knows more about displays can weigh in if we should keep poking at it. I think either @kmatch98 or myself can put one in.

anecdata commented 3 years ago

This all may not be new info, but... my understanding is that setting and checking SPI frequency has no effect unless you do it within the context of the object. Each SPI device runs with its own frequency when it controls the bus. For example, you can check actual ESP32SPI speed with:

with esp._spi_device as espspi:
    print('{:,}'.format(espspi.frequency), "Hz")

But Fourwire has no such method that I know of. I get no immediate error setting it to 80MHz. Default looks to be 24MHz: https://github.com/adafruit/circuitpython/blob/f5aa55c247561e3802b9895e586528fe09e8d7d9/shared-bindings/displayio/FourWire.c#L70 Actual SPI frequencies may also vary from set SPI frequencies by port based on timers and divisors, etc.

After a hard reset, the ESP32-S2 will show SPI frequency of 250,000 Hz, but after setting Fourwire to 80MHz, even a check out of context after that shows 80MHz, so it's sticky (and survives reloads) but I wouldn't count on it being accurate for in-context speed, particularly if there is more than one SPI device.

I seem to recall that Scott addressed the 40MHz vs. 80MHz issue in an early ESP32-S2 Deep Dive, but I don't recall the outcome. The Espressif-specific settings for higher speeds could be useful.

gamblor21 commented 3 years ago

As referenced in PR #4793 I will take a look at an API change to allow separate data and command baud rates for the displays.

gamblor21 commented 3 years ago

Summing up my investigation from the weekend, I'll try to add and make the CP meeting Monday as I think it would be a good in the weeds topic.

With all that said I have the following questions:

  1. Is it worth changing the API? The chips in use seem to have one frequency for both data and commands and slowing the commands to 1Mhz while keeping the data at 10Mhz did not solve the problem.
  2. I don't have the equipment to check but I wonder if the non-DMA arduino gets slowed down unintentionally? Maybe not?
  3. Has anyone tested ESP32-S2 high speed SPI with a non-display use case? Trying to isolate this to a SPI or display problem.
kevinjwalters commented 3 years ago

@gamblor21 For 3 can that easily be done to another non-ESP32S2 chip using short wires with same size data to see if it arrives ok? I.e. send data burst and receive and checksum it on other chip, repeat (with a pause) for many tests looking for rx errors or wrong checksums. A logic analyser capture would be useful but sounds a bit fiddly physically to obtain? And tedious if most xfers are good!

kmatch98 commented 3 years ago

There are some cryptic notes in the IDF documents about the SPI settings. There is one function that is supposed to evaluate whether the frequency can be met https://docs.espressif.com/projects/esp-idf/en/latest/esp32s2/api-reference/peripherals/spi_master.html#_CPPv418spi_get_freq_limitbi

This function to configure the SPI speed has a cryptic message for the input_delay_ns parameter “For better timing performance at high frequency (over 8MHz), it’s suggest to have the right value.”

Is it possible that some other SPI configuration adjustments are required to run at >= 8MHz? Or is it just coincidence that 5MHz works ok and then it glitches at 8MHz?

gamblor21 commented 3 years ago

So on a hunch I forced DMA to not be enabled (in the esp32s2 SPI.C). I bumped the frequency back to 60Mhz and it has been running flawlessly for almost 30 minutes, a lot longer then ever before without artifacts.

kmatch98 commented 3 years ago

@gamblor21 Awesome. I replicated your results on my Saola Wrover demo board. I build a "non-DMA-SPI" version using CircuitPython 6.3.x. It's run for a few minutes so far with no glitches. I'll run it overnight and report if I see anything in the morning.


I hacked the SPI.c file in the ports/esp32s2/common-hal/busio file to turn off DMA for the

I edited the function common_hal_busio_spi_transfer so that it never uses DMA. Here's my hackety-hack to ensure the else is always used:

    size_t burst_length;
    // If both of the incoming pointers are DMA capable then use DMA. Otherwise, do
    // bursts the size of the SPI data buffer without DMA.
    if (0) /* ((data_out == NULL || esp_ptr_dma_capable(data_out)) &&
        (data_in == NULL || esp_ptr_dma_capable(data_out)))  */
        {
        hal->dma_enabled = 1;
        burst_length = LLDESC_MAX_NUM_PER_DESC;
    } else {
        hal->dma_enabled = 0;
        burst_length = sizeof(hal->hw->data_buf);
        // When switching to non-DMA, we need to make sure DMA is off. Otherwise,
        // the S2 will transmit zeroes instead of our data.
        hal->hw->dma_out_link.dma_tx_ena = 0;
        hal->hw->dma_out_link.stop = 1;
    }
kevinjwalters commented 3 years ago

@tannewt I don't think it's necessarily relevant here but is there a cut and paste error in that code:

https://github.com/adafruit/circuitpython/blob/1a0b4193b7539899a41952769e7b566be7580d6a/ports/esp32s2/common-hal/busio/SPI.c#L360-L363

Is the second esp_ptr_dma_capable(data_out) supposed to be esp_ptr_dma_capable(data_in)?

kmatch98 commented 3 years ago

Ran overnight with the hack above and screen looks correct. I suspect this will be an “in the weeds” at today’s CircuitPython call to discuss on which way to go with the associated PR.

tannewt commented 3 years ago

@tannewt I don't think it's necessarily relevant here but is there a cut and paste error in that code:

https://github.com/adafruit/circuitpython/blob/1a0b4193b7539899a41952769e7b566be7580d6a/ports/esp32s2/common-hal/busio/SPI.c#L360-L363

Is the second esp_ptr_dma_capable(data_out) supposed to be esp_ptr_dma_capable(data_in)?

Ya, looks like a bug.

So on a hunch I forced DMA to not be enabled (in the esp32s2 SPI.C). I bumped the frequency back to 60Mhz and it has been running flawlessly for almost 30 minutes, a lot longer then ever before without artifacts.

I wouldn't necessarily take this to mean DMA is the issue. Manually feeding in bytes can cause gaps in the the SPI transmission that could avoid the time critical part on the display side. The issue may be the byte-to-byte speed.

  1. Is it worth changing the API? The chips in use seem to have one frequency for both data and commands and slowing the commands to 1Mhz while keeping the data at 10Mhz did not solve the problem.

No, I don't think so. You showed that wouldn't help.

gamblor21 commented 3 years ago

I have something that needs more investigation still but wanted to post what I did find. I got a Saleae and was trying to capture the glitch. I finally managed to get a few instances where an entire line was blank. I noticed that the timing between the column select command and the data was ~36 microseconds.

1 2

Everyone other case the command to data is about 9 microseconds.

3

The following is a guess but if for some reason this delay is causing the column select command to time out (and maybe to 0?) it would result in a blank line. It is still a guess but the fact I have observed it in the same spot as a glitch occurred 3 times now leads me to believe it has something to do with the problem.

tannewt commented 3 years ago

Is this still an issue?

gamblor21 commented 3 years ago

It is still an issue (as far as I know). I haven't had any time to look at it lately though.

What I do know:

  1. The behavior is what would be expected if the column select command is sent 0 instead of a real value. Verified this by sending 0 on purpose.
  2. The logic analyzer shows the correct value being sent, but the display renders otherwise. I'm not sure why the discrepancy.
  3. I changed the flow at one point to send each column select command byte one by one (ignoring DMA). In that situation I was able to see an incorrect value sent (0 instead of the number it should have been). I'm not sure why this happened but is likely linked to the original problem.
dhalbert commented 3 years ago

The immediate problem has been fixed by #4793, but the reason for this issue stll remains undiagnosed. So we'll leave this issue open.

Loopbackconnector commented 3 years ago

I wanted to add my findings to this. So I am having the same issues, running a circuit python app (specifically https://learn.adafruit.com/using-the-adafruit-funhouse-with-home-assistant ) the bigger issue I have is it seems to crash communications as well (or it is a leak that finally ends it) but usually my funhouse goes offline from a getting updates standpoint. I have tried both python 6 and 7. It seems to run longer when I run it on 7 but ultimately both crater.

image
dhalbert commented 3 years ago

I have tried both python 6 and 7. It seems to run longer when I run it on 7 but ultimately both crater.

@Loopbackconnector Just confirming, have you tried 7.0.0-alpha.5? #4793 was incorporated into alpha.5, but is not in alpha.4.

Loopbackconnector commented 3 years ago

@dhalbert The version that I was running in that picture was alpha.4. I will update to alpha.5 to see how it goes.

Loopbackconnector commented 3 years ago

12 + hours after update and display still looks good!