SebiTimeWaster / ICN2053_ESP32_LedWall

An Arduino implementation to drive ICN2053 based LED wall segments running on the ESP32 platform
MIT License
23 stars 8 forks source link

Add ESP32 DMA Driving? #3

Open embedded-creations opened 2 years ago

embedded-creations commented 2 years ago

I was forwarded a link to your project by someone that's trying to get ICN2053 panels working in SmartMatrix Library (my project). I see you ruled out using PWM to drive these panels, but have you looked into using DMA to toggle GPIO pins to update the panels? That's what SmartMatrix Library and other libraries like this one use to continuously refresh the panel without using the CPU. I haven't looked into the details of how the ICN2053 needs to be refreshed so I may be missing some reason why DMA driving GPIO wouldn't work.

SebiTimeWaster commented 2 years ago

hi, the thing that takes up the most amount of time was by far the clock cycle generation since the color information is latched out pretty efficiently (the color data is pre-calculated by core 1) by writing 8 bits (or was it 16 bits?) of information in one command to an IO register. when i wrote this i looked into all kinds of ways to create the clock via some form of hardware acceleration (short of writing assembler code on the co processor of the ESP32 which could probably be a viable way to speed things up but its way outside of my expertise). i found nothing that works since these chips require a very precise data input (for instance 138 clock cycles specifically). nothing was able to create this exactly, even using hardware PWM turned out to not be precisely controllable (the amount of pulses was not guaranteed depending on what the background processes did at that time, remember the ESP32 runs a realtime OS in the background, the code therefore is depending on RTOS processes).

but it could be that i oversaw something, if you think so, feel free to open a pull request :) hm... maybe its possible to just put a series of 0 and 1s into memory that represent the clock cycle and latch that out via DMA?

SebiTimeWaster commented 2 years ago

I wanted to create a piece of HC74 logic that would create the clock signal, decoupling this task completely from the ESP32, but never came around to do it, here is what i came up with so far:

image

its a LogiSim file: external_clock.circ.zip

embedded-creations commented 2 years ago

hm... maybe its possible to just put a series of 0 and 1s into memory that represent the clock cycle and latch that out via DMA?

Yes, that's close to what we do with our libraries, though the clock is generated by the I2S peripheral. There's more details including a link to the original ESP32 example code that shows how to shift data from memory to GPIO via the I2S bus here:

https://github.com/pixelmatix/SmartMatrix/wiki/ESP32-Port

I don't know what you need to shift out to the ICN2053 chip specifically that is different from "normal" HUB75 panels, but as long as there's no need for the data/address lines to change without the clock, it should be straightforward to adapt the existing I2S DMA code to drive these panels. (The I2S peripheral creates a clock edge for each GPIO update, you can't just update the GPIO without the clock. You could ignore the actual I2S clock signal and manually create the clock by alternating 0/1 manually using up twice the memory)

daveythacher commented 2 years ago

Summary of the difference: These panels work differently and ICN2053 is one of the two GCLK driver types. OE is the critical deadline in BCM under RPI and SmartMatrix. For ICN2053 panels, it is GCLK that is critical. The CLK operation is low priority and just needs to meet timing. GCLK is OE in these, GCLK is a clock/state machine not an output enable/load signal. I will ignore the differences between the two GCLK driver types.

Comment on request: Moving to DMA for CLK will not accomplish much by itself. You would need to automate GCLK and multiplexing. Note the refresh rate is much higher with GCLK drivers. This will increase the interrupt rate more than likely. There is a synchronization point for GCLK and CLK at VSYNC. Currently this is handled by super loop.

There is an advantage to doing this. It could allow for higher refresh rates or higher clock depth. If you have more RAM, it could allow for larger chains. However, the primary benefit would be getting that CPU back. Everything else is probably not as important. You have to switch to async thread model. SmartMatrix does not support multicore so this would be the model it would use.

Speculation of 138: The 138 is likely 128+10. The 10 is used as a delay for the anti-ghosting. It is referred to as deadtime and is required to change row. The original code base did not use the anti-ghosting. If you increase the 10, it will assume you are in the next line. You are supposed to clock a single cycle here. However, it looks like the counter is reset over 10 clock pulses, so it never causes an issue.

You would want to use SPI, UART, etc. RPI has serializer in PWM. RP2040 has PIO. That should give you the control you are looking for. Configuration is documented in the receiver cards, kind of.

daveythacher commented 2 years ago

Comment on conversation: The idea of automating the super loop is possible doing something shown above. However, it will have limits. CLK will likely have to be no faster than 6-12MHz. (Guess based on gated clock prorogation delay.) This will limit the refresh, but it could be better than the current logic. This will increase the hardware cost. The benefit to this approach is the software complexity. However, this approach will not support anti-ghosting. This approach is also tied to the ICN2053. You will need small FPGA like ICE40/MachXO2 to make this work for others. (You need to be able to program the NAND gate. There could also be an issue for LAT commands also.)

This will not work on the RPI. The software would be more like the current super loop. Again, using SPI, UART, external microcontroller, etc. would allow you to move in parallel. Setup a serial for GCLK then shift out on CLK. When done with CLK block till done with GCLK. Aka make a new PinPulser. This could be done here, without the external hardware. You can do all kinds of clock tricks too. For UART you have to factor in start and stop bits.

Now there is a wrinkle here. This applies to not just my suggestion but everyone's. You are changing the algorithm fundamentally to async thread model. There will be concurrency here, which will force you to consider the VSYNC sync point. (You need to be able to stop GCLK.) Working around this is possible in both approaches. (To the best of my knowledge at least.)

For the hardware circuit shown above you will need to use a GPIO instead of LAT?

Edit: Using serializer would allow anti-ghosting to be supported. Again, clocking tricks.

Edit 2: It looks like you guys are more of a fan of external PinPulser via discrete logic, microcontroller or FPGA. This is simpler (kind of) but almost everything else is a downside. You are likely better off avoiding discrete logic. I actually considered this a few times over the past 5-6 years.

Edit 3: I moved away from external PinPulser due to complexity. I figured out a few cool ideas for it but none really fixed the complexity. The async version is harder in general. Another driver's app note mentions another way to drive them.

LAutour commented 2 years ago

ESP32 + icn2053(FM6353) + I2S DMA :) https://github.com/LAutour/ESP32-HUB75-MatrixPanel-I2S-DMA-icn2053

daveythacher commented 1 year ago

Random thought: The command protocol within the LAT is basically right justified PWM. You just change the duty cycle. You could also just use parallel shift registers with FlexIO or SPI(s).

Technically you just block until VSYNC. A flag would be set by the data process. The multiplex logic would stop on that flag when it reaches the VSYNC line and clear the flag. This would release the data process to start the VSYNC process. (There may need to be a some GCLK operations too.) VSYNC forces serialization due to shared internal state. A fully concurrent process is not supported to the best of my knowledge. (This would create tearing anyhow.)

Edit: Many S-PWM drivers use command protocol in LAT. This means these are 7 bit sync bus. The GCLK is mostly a data stream also, however ghosting and multiplexing complicates this. The disadvantage to high refresh rate is high interrupt rate. (Trigger transport?)