Makuna / NeoPixelBus

An Arduino NeoPixel support library supporting a large variety of individually addressable LEDs. Please refer to the Wiki for more details. Please use the GitHub Discussions to ask questions as the GitHub Issues feature is used for bug tracking.
GNU Lesser General Public License v3.0
1.17k stars 259 forks source link

ESP32 I2S Parallel driving external shift register #648

Open Blake-Ballew opened 1 year ago

Blake-Ballew commented 1 year ago

NOTE: If you are seeking help or have questions, this is NOT the place to do it. For questions and support, jump on Gitter and ask away.
Gitter

Is your feature request related to a problem? Please describe. Looking for a way to expand the number of strips available when constrained on free GPIO pins.

Describe the solution you'd like Creating an ESP32 I2S driver to enable multiple outputs over a single pin through a shift register as described in this article: https://hackaday.com/2019/05/07/lots-of-blinky-esp32-drives-20000-ws2812-leds/

Describe alternatives you've considered I am unaware of any other options to achieve multiple strips running off of a single data pin.

Additional context I'm unfamiliar with this codebase and I2S driver implementation in general, but I'm currently trying to understand both and would be willing to help in some capacity.

Makuna commented 1 year ago

NOTE: Shift registers require three pins: data, clock, and latch. If more than one external shift register is used, then they can reuse the clock and latch between them; but they will each require their own data line.

As noted in the linked article; it will use the parallel feature of the i2s (newly added to NeoPixelBus) to provide the latch and 7 data io lines (8bit parallel) or 15 data lines (16bit parallel). I believe the ESP32 I2S has the ability to have a separate dedicated clock line but a little investigation can prove this and worst case another channel can be used for the clock.

Blake-Ballew commented 1 year ago

I believe the ESP32 I2S has the ability to have a separate dedicated clock line but a little investigation can prove this and worst case another channel can be used for the clock.

From the FastLED virtual driver line 707:

template<int *Pins,int CLOCK_PIN,int LATCH_PIN, EOrder RGB_ORDER = GRB>
class ClocklessController : public CPixelLEDController<RGB_ORDER>
{

  //int *Pins;
       const int deviceBaseIndex[2] = {I2S0O_DATA_OUT0_IDX, I2S1O_DATA_OUT0_IDX};
    const int deviceClockIndex[2] = {I2S0O_BCK_OUT_IDX, I2S1O_BCK_OUT_IDX};
    const int deviceWordSelectIndex[2] = {I2S0O_WS_OUT_IDX, I2S1O_WS_OUT_IDX};
    const periph_module_t deviceModule[2] = {PERIPH_I2S0_MODULE, PERIPH_I2S1_MODULE};
public:

    void init()
    {
       // brigthness=10;

     //Serial.printf("%d %d %d\n",CLOCK_PIN,LATCH_PIN,NUM_LED_PER_STRIP);

      for (int i = 0; i < NBIS2SERIALPINS; i++)
        if (Pins[i] > -1)
        {
            PIN_FUNC_SELECT(GPIO_PIN_MUX_REG[Pins[i]], PIN_FUNC_GPIO);
            gpio_set_direction((gpio_num_t)Pins[i], (gpio_mode_t)GPIO_MODE_DEF_OUTPUT);
     pinMode(Pins[i],OUTPUT);
            gpio_matrix_out(Pins[i], deviceBaseIndex[I2S_DEVICE] + i+8, false, false);
        }

        //latch pin
        PIN_FUNC_SELECT(GPIO_PIN_MUX_REG[LATCH_PIN], PIN_FUNC_GPIO);
            gpio_set_direction((gpio_num_t)LATCH_PIN, (gpio_mode_t)GPIO_MODE_DEF_OUTPUT);
     pinMode(LATCH_PIN,OUTPUT);
            gpio_matrix_out(LATCH_PIN, deviceBaseIndex[I2S_DEVICE] + NBIS2SERIALPINS+8, false, false);
    //if (baseClock > -1)
    //clock pin
        gpio_matrix_out(CLOCK_PIN, deviceClockIndex[I2S_DEVICE], false, false);

It seems like the clock is coming from a dedicated channel for each I2S bus if I'm reading this correctly.

Makuna commented 1 year ago

Actually, I think the clock is separate as I2S supports not only the data lines, but also a clock and word select pins.
But it does look like the latch is using one of the channels due to indexing based on base_; probably because the WS pin is held high for all the data bits of a word, then toggled for the next data bits (left and right audio data channel selection).
But if the I2s is set in mono mode, does the WS stay high or will it dip for a short period between word sends. If it does, it could also be used for the latch thus saving another data channel (AND saving memory used to represent it). The docs imply it can be used for frame sync alternatively.

https://en.wikipedia.org/wiki/I%C2%B2S

from gpi_sig_map.h I2S1I_BCK_OUT_IDX I2S1I_WS_OUT_IDX I2S1O_DATA_OUT0_IDX // start of the parallel data mux pin selection.

Makuna commented 1 year ago

The shift register pin naming is a little messy. So I thought I would include a map of the ones we care about ;-) SRCLK - (Shift Register Clock) basically the data clock, positive edge reaction. SER - basically the serial data RLCK - (Storage Register Clock) basically the data latch, updates storage registers from shift registers on postive edge SRCLR - (Shift Register Clear) basically tied to vcc to keep high
OE - (Output Enabled) basically tied to gnd to keep low and the outputs enabled

Makuna commented 1 year ago

So the I2S does provide both a CLK and WS pin output. But as suspected, the WS isn't useful. In non PDM mode the WS matches the frequency. BUT, the clock signal is useful as captured here with two channels of x8 parallel NeoPixel pulses. But to output N channels to the shift register on a single data pin, the frequency will need to be increased by xN.

image

Makuna commented 1 year ago

Ramped up the speed by 8 (needed for a 8bit shift register) and while a little messy, it does seem to provide the correct data that a shift register could use.

hpwit commented 1 year ago

@Makuna You intend on implementing my method to drive virtual pins into your library ? @blakeb130 the library you’re referring to is the old version of the virtual driver after I have added the i2s for fastLED like at least 3 years ago. you’ll find in my GitHub a more refine version which can drive 8 virtual pins per es32 pin. The one I have added at the time drove only 5 virtual pins per esp32 pin indeed I use the clock of the i2s driver as the clock based. I do not use the WS for a reason I don’t remember exactly yet. ;) Then I did not really changed it as driving 120 // strips is already a lot.

I could rewrite it to be able to only push through 7 pins instead of 15 this way you’ll save 384bytes of memory

Having it to work with the 8 virtual pins was a bit a challenge because the esp32 clock above 20mhz tends to skip bits from time to time. Which is no big deal for its intended usage sound. But for the leds it is.

Makuna commented 1 year ago

@hpwit So, it was 7 pins + WS that worked ok? My experiments showed the max was 8 also. The i2s parallel work was done in a previous check-in. I am now writing the encoding code that will merge the SR channels into the parallel channels to form one stream. What a mind-bending experience.

    // latch must be first mux channel, i2sX sends bits as LSB order
    // 
    // Order of bits in i2s data stream to support latch and x shift registers
    //      (i2sx16)  (i2sx8)
    //                     S 
    //                     R 
    // mux- edcba987 6543210L
    // P0b1 11111111 11111111
    // P1b1 11111111 11111110
    // P2b1 11111111 11111110
    // P3b1 11111111 11111110
    // P4b1 11111111 11111110
    // P5b1 11111111 11111110
    // P6b1 11111111 11111110
    // P7b1 11111111 11111110
    // P0b2 11111111 11111111
    // P1b2 11111111 11111110
    // P2b2 11111111 11111110
    // P3b2 11111111 11111110
    // P4b2 11111111 11111110
    // P5b2 11111111 11111110
    // P6b2 11111111 11111110
    // P7b2 11111111 11111110 ...

Design notes: Removing the need to call Show() on each NeoPixelBus in the collective; but instead, just call a new method ShowAll() on any single bus in the collective to trigger the encoding and then sending the data stream. This will also bypass the dirty flag mechanism of each of the individual busses. Long Term:. Could we use the WS for latch (with small external components) to remove the need for using parallel mode altogether but still give us 8 channels on the Shift Register? This would be a big memory savings as having/using all 56 parallel channels (7 parallel x 8 shift register outputs) is overkill but also the minimum size of the collective. Note, the DMA send buffer and its twin non-dma memory edit buffer must be full size; while individual BUS edit buffers are only allocated if a bus is instanced.

hpwit commented 1 year ago

@Makuna I have just look at the ws signal for latch. it does not work the ws signal is not after 8 or 16 serial bits. The code to transform serial to parrallel you cannot quite go around it What do you mean by this. This will also bypass the dirty flag mechanism of each of the individual busses.

You need to use the parallel mode to send to each 7 shifts registers at the same time. even if you could use the WS signal as latch.

Note, the DMA send buffer and its twin non-dma memory edit buffer must be full size; while individual BUS edit buffers are only allocated if a bus is instanced.

I do not understand your point.sorry

Makuna commented 1 year ago

Using a single shift register on the WS output today (which cycles on a single bit or half the speed of the clock) you could divide it by 8 and thus create a Latch output at the correct timing to use on the other shift registers. Shift registers can be used as dividers. The only issue I could see was triggering it at the correct time, as the I2s seems to continuously send the WS when we would need it to start at the same time as the first data bit. Could the data pin be used as the latch/clear/enable for WS divider Shift Register?

Those comments at the end (dirty flag and dma memory size) were not targeted at you. They were just capturing details of the i2s parallel Shift Register version will have.

hpwit commented 1 year ago

you need then to use D flip/flop to divide the frequency. as I will not work with an hc595 that needs a latch to ‘show’ the input to the output pins. In theory you could then use the clock signal to do so. It needs to be studied. Because you need to take the propagation speed of the signal through the gates. Indeed we are talking in around 50ns between signal. The normal ic could create a shift in signal. It would have to be tested.

Why do you need to not have the latch with the data stream ? Of course you can drive one less shift register but as you said 7x8 is 56 strips which is already a lot. And 15x8 =120 which is more

Makuna commented 1 year ago

Its more about having a low memory and low pin count solution. If you could get the RW/Clock latch to work, then the non-parallel i2s could support a series of shift registers, getting x8 and even x16 support using only three pins. And the x8 would only need memory for x8, not the x64 it does with this solution. x8 SR would use the same frequency we already know is supportable.

hpwit commented 1 year ago

You want to use the series i2s to drive 1 shift register. Because using serial i2s you will not drive more than one shift register because of i2s speed limitations. Then maybe using the ws will work. In parrallel mode of you want to drive 8 SR you need the 16bit per bits not 64. or I don’t see how you want to implement the driver

hpwit commented 1 year ago

@Makuna I have tried to go with the SPI, which can go faster than 20Mhz on my oscillo i can see good signal up to 80Mhz . But the issue is that 74hc595 'classic' can go only up to 25Mhz as well as any 'classic' frequence divider. I will look if i can find high speed ICs. => possibility to drive 1 SR unless Highspeed ICs Otherwise the solution to use WS will need to be tested. let me know if you think you want to go that direction we could work together

Makuna commented 1 year ago

SPI will require the DMA core to not have pauses. I have yet to confirm the new DMA SPI method for DotStars doesn't also have pauses (SPI spec doesn't state that it can't stretch and pause, making it more resilient on longer wire lengths).

Note that when i2s is used in the non-parallel mode, the WS timing is correct i2s timing (a word of data). But it seems when in single channel 16 bit mode, they still send a normal balanced pulse even though there should be only one channel (WS is a sort of channel select, right and left is + and - on).

Here is a capture (note, WS is inverted).

image

The primary thing I wanted to prove with this capture was that WS could be used to "sync" the data with the latch. While it still can't be used directly as the latch (without a 4x multiplier, is this a thing?), when inverted as the capture above, on its rising edge it could be used to sync a divider of the real clock that is tied to the latch. The latch would need a div 4 from the clock that is synchronized starting at the edge of the WS.

Blake-Ballew commented 1 year ago

I will look if i can find high speed ICs

Thoughts on the 74VHC164? Seems to run considerably faster than the 74hc595. Not sure if there's any other electrical/signal properties I should be comparing too.