focalintent / FastLED-Sparkcore

SparkCore specific port of FastLED
MIT License
24 stars 34 forks source link

spi signaling errors #6

Closed Hotaman closed 8 years ago

Hotaman commented 8 years ago

I'm using APA102 LEDs (50 total) on Photon hardware SPI port pins. I believe the library is using software SPI. It Always clocks at 8MHz no matter what speed setting I use so there is a problem there perhaps. I'm working on getting hardware SPI working as I have clocked these LEDs at 50MHz on the Photon without issue.

The Problem is that there is faint red flickering where LEDs should be off (sending all 00s each frame). The LED data is correct and looking at the interface with my logic analyzer there appears to be some signaling errors. There are sporadic highs on the data line and I must be loosing sync on the clock. My scope PS is currently waiting parts so I can't get more detail. Red is the last byte and the flickering is faint so the errors are occurring on the final bits of each pixel.

The first 10-15 pixels don't appear to ever flicker. The problem gets worse as you get closer to the last (50th) LED (brighter red). The pattern originates from the end and moves towards the first over several 'frames' and flashes (appears) randomly at varying brightnesses.

The red will flicker randomly for 1-3 seconds, then it will not show for 4-8 seconds

focalintent commented 8 years ago

I have not yet implemented hardware spi for the photon/spark platform

focalintent commented 8 years ago

Also software spi won't get much faster than 8mhz - you could get it slower (but because it's software, the mapping from desired data rate to actual is a bit loose)

Hotaman commented 8 years ago

@focalintent, I saw that when I went through the code. I opened this issue because there are still some signaling issues in the software SPI implementation for Particle devices. I'm putting together the hardware support and looking at the software implementation as well. I'll give you a hand on the Particle version. Sounds like your plate is pretty full at the moment. Mine too, but I do have a few hours free on the weekends at the moment for Particle work. I'm using it to test the Electron and help keep libs updated. Once I have it solid on a Photon, I'll also make sure it works on the Electron. I'm hoping to make some headway this coming weekend now that I have my scope running again. It's currently tripping up my protocol analyzer as well as the APA102s so it's not very helpful other than showing there are some problems.

focalintent commented 8 years ago

Is there too little of a gap between when the data line is changed and when the clock line is strobed? I'd be curious to see what the output looks like on a scope.

One thing I saw with another piece of hardware was that different pins had different timings in terms of how quickly they would go from low to high. If you look at fastspi_bitbang.h:120 - I'm setting the data pin and then immediately, with no nop, i'm strobing the clock line high, and then back low again. If the clock pin reaches high faster (or, if the APA102 chip registers the clock line as strobing at a lower voltage than the data line requires to be treated as a 1 - which may be possible) it's possible the apa102 would pull the wrong value off the data pin.

Try this - again, around line 120 in fastspi_bitbang.h - replace the static void writeBit(uint8_t b) function there with this:

        // write the BIT'th bit out via spi, setting the data pin then strobing the clcok
        template <uint8_t BIT> __attribute__((always_inline, hot)) inline static void writeBit(uint8_t b) {
                if(b & (1 << BIT)) {
                        FastPin<DATA_PIN>::hi(); SPI_DELAY_HALF;
                        FastPin<CLOCK_PIN>::hi(); SPI_DELAY;
                        FastPin<CLOCK_PIN>::lo(); SPI_DELAY_HALF;
                } else {
                        FastPin<DATA_PIN>::lo(); SPI_DELAY_HALF;
                        FastPin<CLOCK_PIN>::hi(); SPI_DELAY;
                        FastPin<CLOCK_PIN>::lo(); SPI_DELAY_HALF;
                }   
        }   

it puts a little bit more of a delay (at least one clock cycle) between changing the data pin and changing the clock pin. (SPI peripherals generally seem to be much better in their timing of things, and also - I don't think they have the same internal timing overhead as GPIO bit twiddling).

focalintent commented 8 years ago

(It looks like I may have run into this before - because the implementation of writeBit that is used in one variation of the bitbang'd spi output already has that splitting around of delay - however, for a variety of reasons, that code is not used by APA102, in part because of the APA102's header) - i may go ahead and make this change and check it in anyway, just to bring everything in line

focalintent commented 8 years ago

Yup - checked in - try repulling and see if that helps with the signaling issues at all.

(I've done tests where I've set two pins high at the same time (by making a single write to their port register) and the curves from 0->3.3v for the two pins were wildly different - annoyingly, this killed an optimization that would've made the bitbang'd spi output stupidly fast)

focalintent commented 8 years ago

Oh - here's another question - I'm getting reports from people about APA102's not being quite as 3.3v friendly as they often seem out of the gate. Out of curiosity/for laughs - what happens when you toss a 74HCT245 between the photon pins and the leds?

Jerware commented 8 years ago

I've been running 256 APA102C LEDs using FastLED & a Photon for several days straight without issue. Both the data & clock pins are run through a 74HCT245.

Hotaman commented 8 years ago

@Jerware Can you try continuously updating the strip with all 00's in a dark room to see if you see any sporadic flashing? I didn't notice the problem until I started playing with patterns that had a lot of 'off' pixels.

@focalintent Thanks for the info! I'll be working on this Saturday, I'm in the middle of moving a mission critical system to Docker at work so I can't afford to let my mind stray to much, too many moving parts at the moment. I'll scope both versions to see the diff.

I might agree with the voltage issue, but the Photon is only driving the first LED in the strip, All following LEDS are being driven at 5v by the previous LED's output. This is why making super long strips even works. If it was a single wire driving all LED inputs there would be horrible capacitance problems on the line, especially at the speeds we are running them at.

The problem may very well be caused by part/pin response times. I'll also test with several different Photons to see if the results vary by part. I currently have ~9 meters of 60/m APA102s that I can test with. We will figure this out!

Hotaman commented 8 years ago

@focalintent First off, revert your changes to fastspi_bitbang.h, they broke it bad!

Per the STM32F1(2)xxx programmers manual section 2.2.4 Software Ordering Of Memory Access. To ensure the hardware, follows the software (The processor can 'optimize' memory access across instructions!) you must use one of the following instructions after a memory access: DMB, DSB, or ISB to ensure the hardware actually does what you tell it to do, when you tell it to!

So far I made the following changes to fastpin_arm_stm32.h:

#if defined(STM32F2XX)
  inline static void hi() __attribute__ ((always_inline)) { _GPIO::r()->BSRRL = _MASK; asm("dsb"); }
  inline static void lo() __attribute__ ((always_inline)) { _GPIO::r()->BSRRH = _MASK; asm("dsb"); }
#else
  inline static void hi() __attribute__ ((always_inline)) { _GPIO::r()->BSRR = _MASK; asm("dsb"); }
  inline static void lo() __attribute__ ((always_inline)) { _GPIO::r()->BRR = _MASK; asm("dsb"); }
  // inline static void lo() __attribute__ ((always_inline)) { _GPIO::r()->BSRR = (_MASK<<16); }
#endif

Looks like there might be a couple other places to sprinkle DSB commands to get it completely cleaned up, but the above changes corrected the red flashing and cleaned up the signals a lot.

Basicly, the DSB command forces any pending memory/register access to complete before the next instruction is executed (makes it work like your normal everyday processor :)

Now that this mystery is solved, I'm moving on to adding hardware SPI support, I feel the need for speed!

To anyone trying to get this to run on a particle device: Make sure you start your .ino file like this...

#ifndef STM32F2XX
  #define STM32F2XX
#endif

#include "FastLED.h"
FASTLED_USING_NAMESPACE;

// For led chips like Neopixels, which have a data line, ground, and power, you just
// need to define DATA_PIN.  For led chipsets that are SPI based (four wires - data, clock,
// ground, and power), like the LPD8806 define both DATA_PIN and CLOCK_PIN
#define DATA_PIN A5
#define CLOCK_PIN A3

#define NUM_LEDS    50

#define BRIGHTNESS  200
#define FRAMES_PER_SECOND 30

CRGB leds[NUM_LEDS];

void setup() {
  delay(3000); // sanity delay
  // working addLeds() for APA102s
  // note that the chipset brightness feature is not supported
  // You can set the max brightness in chipsets.h
  FastLED.addLeds<DOTSTAR, DATA_PIN, CLOCK_PIN, BGR, 2>(leds, NUM_LEDS).setCorrection( TypicalLEDStrip );
  FastLED.setBrightness( BRIGHTNESS );
}

The STM32F2XX (Photon/Electron/P0/P1) define is the key to getting the right code compiled! Define STM32F10X_MD for the Core Define ONE of these before including the FastLED library. Perhaps these are currently set by the default system include, but I wouldn't depend on it,

Enjoy!

Hotaman commented 8 years ago

@focalintent Those instructions do have an impact on speed. Perhaps similar to what you saw when using the bitband registers? The software SPI speed is just under 6MHz with the DSB instructions added.

Average is around 5.75MHz The clock pulse width is 40ns

Interesting, I have removed your delays in bitbang.h with no appreciable change to the speed or waveforms. Need to do more digging on this to optimize the bitbang speed on the ARM.

focalintent commented 8 years ago

I am traveling right now and won't be able to look at anything until tomorrow or Tuesday night. The delay values are probably pretty close to 0 or 1 with apa102 clock speeds, which may be why you aren't seeing a whole lot of change.

As for dbs - it basically forces/waits for a flush to memory so I'm also not surprised that it slows things down.

Jerware commented 8 years ago

@hotaman I'm not clear on whether you've tried a 74HCT245. Inconsistent 3.3V compatibility is a well known issue with APA102 and ws2812B. That's why the Teensy LC includes one onboard, for instance.

tenaciousRas commented 8 years ago

Did that code get committed by an unverified and un-reproduced issue, sort of a "shot in the dark", so to speak? Unless commit 0d0e971... was tested on hardware then it should probably be reverted.

@Hotaman Looks like STM32F2XX is defined in core-firmware/platform/MCU/STM32F2xx/CMSIS/Device/ST/Include/stm32f2xx.h around line 68.

I tried the DSB instructions but they did not help my LEDWax code, where I'm seeing some flickering, but this could be cause by hardware since my fixture is suspect. The flickering is not a show-stopper but it will have to be fixed. I know my hardware is suspect because I'm experiencing verifiable problems with voltage-level translation with a 2811(A) strip and a Photon, like many people. An oscope is necessary to debug and tweak this code, anything else is guesswork.

I didn't know this fork only has software SPI. FWIW my preference for a lib like this is SPI peripheral (h/w SPI) support rather than support for a massively diverse set of MCUs and LED chipsets. Is it more valuable to spend time getting SPI peripherals (h/w SPI) to work or tweaking software SPI timings? I'd vote for the former.

As for software timings, how can that be done properly - for the speed and accuracy being discussed - without counting MCU ticks on the generated assembly code? Is that already the process for development?

focalintent commented 8 years ago

I didn't know this fork only has software SPI. FWIW my preference for a lib like this is SPI peripheral (h/w SPI) support rather than support for a massively diverse set of MCUs and LED chipsets.

You're welcome to your preference - however, the fact of the matter is if I was going to scale back the number of MCU's I supported, the stm32's would be the first to be dropped :)

This platform is enough of a pain in the ass to work on that I haven't had the time to dig in the hardware spi support for it (and the stm32 is a relatively minor platform in terms of who uses the library and numbers).

Software spi doesn't really need timings unless you want to accurately clock below 4-8mhz which is why they're hand waved for the bitbang'd spi output. (What does need accurate timing are the 3-wire led chipsets like the ws2812 and friends - and FastLED's only asm code is for those chipsets (and some high performance math stuff).

focalintent commented 8 years ago

(Also the spi timing changes are for bitbang'd spi output and since that's the same on every platform I did run it by on other hardware since I don't have photo/sparkcore test rigs up and running right now - so I'm mostly working with folks who are actively using them - and the change was as much to bring all the bitbang spi timings in line with each other as it was for this issue)

focalintent commented 8 years ago

As for software timings, how can that be done properly - for the speed and accuracy being discussed - without counting MCU ticks on the generated assembly code? Is that already the process for development?

And yes - for the WS2811 style chipsets on AVR and Cortex M0 based ARM cpus, that's exactly what I did (or, rather, it's hand written assembly, clock counting the instruction cycles). For other ARM cpus where I have access to a clock cycle counter and higher clock rates, I can do it in C/C++ and comparing against clock registers.

(SPI-style chipsets don't have timing requirements, it's one of the things i like about them - outside of a) making sure that they don't clock too quickly, and b) the problem we're tracking here, where we have to make sure that pins change when we expect them to, not when they feel like it :)

Hotaman commented 8 years ago

@Jerware On my test rig, no I do not use a 245. With wires < 6" long it is not required. Final design will include a 245 to allow for the longer wire runs needed in 'production' ;).

@focalintent is spot on, I had not taken the time to read the STM32XXX programmers manual because I wasn't doing ASM on it. I have to admit the section I noted was quite a surprise to me! This little beast actually tries to outguess the programmer on hardware access! Not the way I would have gone on a micro controller which will be doing heavy IO. On the STM32, I would stick with hardware peripherals if you need any speed at all, Looks like it was designed with this assumption.

The 3-wire LEDs run at 800KHz max so bit banging these should work fine, and it does. I'm more interested in the SPI style LEDs simply because they run at vastly faster speeds allowing much higher LED counts per interface.

I'm mulling over the idea of helping you out by taking over the version of the lib for Particle devices. My current go to micros are Photon, Electron, Teensy, and ESP8266. What do you think about it?

focalintent commented 8 years ago

The problem is that my bit banging SPI code is too fast. It takes 100ns for a pin to go from low to fully high. When the APA102 bit banging output is running at full speed (~10-12Mhz clock rate - basically no NOPs between the clock strobing) ) the clock is only high for 40ns.

To put another way, the clock line was barely making it above 1.2v. It's a wonder that it worked at all :)

Of course, this wasn't ever a problem on the spark core because while the 100ns time for a pin to go high appears to be about par for the course for all arm platforms, no matter what clock rate they're running at, the slower clock speed means that a pin can get from 0 to high in less time than it takes to do two adjacent GPIO writes. . With the Photon's 120Mhz clock, however, a single clock is 8ns - or way way less than the 120ns it takes for the pin to go high - and while these chips are slow to drift their voltage high, they like to slam back down to 0 right quick.

So - I'm currently modifying the bit-bang spi output to ensure that there's a minimum of 50-100ns between gpio writes (i'll juggle that timing a bit) - which should take care of the issue described here (unclear whether or not i'll get to the hardware spi tonight, but maybe this weekend). My guess is that this is why the dsb opcode "work" for you - it slowed down the clock transitions enough to allow high to get to a high enough voltage to register.

(It dawns on me that what may be happening is i'm running into some resolution limits on my scope - however the clock line was consistently never making it above 1.5v - so that was a thing that I needed to get fixed at these higher clock rates)

However, there's a different problem that i'm having which is a bit mind boggling - which is that the data line is randomly getting pulled low - even when it should be high for multiple spi clock pulses in a row. (I'm also seeing it randomly get pulled high as well when it should be low). Note that the even with tossing the dsb opcodes in, i'm still getting the semi-random toggles on the data line.

I think I need to dig through the data sheet a bit more - it's possible the registers being used for GPIO access aren't ideal. (The downside to taking port code from someone else).

Hotaman commented 8 years ago

Good catch! One of my scope probes went tits up while I was testing so I didn't get to go real in-depth last weekend. A new set should be at my front door as I write this so I can continue this weekend but I have two Pinewood derby races to run on Sat. so not sure how much time I will have. Depends on how much repair is needed on the track and my timer after storage for the last year. I will at least have time to do more reading in the programmers manual on this beast. I'll keep your findings in mind as I read and let you know if I find anything that can help. A 100ns rise time is pretty sucky for such a high performance part, there must be a trick somewhere to improve that I would hope, but maybe not. I'll keep digging as well.

What scope do you use? I'm using a Fluke 99B 100MHz portable storage scope and the 8 channel Logic logic analyzer.

focalintent commented 8 years ago

Right now just using the scope capabilities in the Logic Pro 8 - but it's only 50MS/s for analog. I'm going to be ordering a 4 channel, 100mhz / 1GSa/S rigol I think which I should have waiting for me by the time I get home on Tuesday.

I also still have this weird pin flipping thing I need to debug.

Hotaman commented 8 years ago

Nice! I got my Logic when they first came out. I love it, but I usually have to slow things down a bit to use it these days (only 24MS/s). That Rigol is sweet! I'd love to have one but my old dinosaurs just won't die so I can replace them :(

I thought I saw something similar going on right before my probe died. I just figured it was my probe. Maybe it was just some glitches, The signals are a bit messy at full speed and I don't see anything obvious in the code that would cause it.

Good luck

focalintent commented 8 years ago

Ok - the DSB is not enough to make the glitching go away - even though adding just that drops the max data rate I can push out the spi ports down to 5Mhz (down from a max of 14 if I get rid of all of the delays).

The weird thing is if i re-enable some of the delay code that I have, I cap out at 11Mhz, but I also have no glitching in the output (this is without DSB's in the pin setting code).

So - what i'm seeing is an almost complete disconnect between the actual clock rate that i'm pushing data out with and the amount of glitching going on.

This is being to smell suspiciously of some compiler weirdness (as in, the compiler deciding to re-order/remove things that it really shouldn't be doing - the the right combination of delays prevents from happening.

Still doing more digging

focalintent commented 8 years ago

It's not the bandwidth - it's the width of the clock pulse. When the clock pulse is around 10ns, glitch city. If the clock pulse width is over 30ns - no glitching.

Or not -- I have a run with 74ns clock widths, and a 3.7mhz data rate, that is glitching like all hell.

(Also curious, if i basically make it so there's no delays between the clock hi and clock lo, the first 7 bits of each byte have a pulse width of 24ns (or, roughly, 3 clocks - each clock being around 8ns). However, the 8th bit, for reasons i'm not entirely clear on, is only 8ns. I really wish I could grab the object files from the build so that I can disassemble them and see what gcc is doing with the code here.)

focalintent commented 8 years ago

Ok - grab master@HEAD.

There were a couple of things going on here.

One is that I need to make sure that the clock pulse is a minimum of 30ns (I setup the code for 35ns, just to be safe). The other is making sure that the high pulse of the clock is never longer than the low pulse. Either of these things seem to cause glitching (which is why i was seeing glitching at what were nominally lower clock rates - because of that ratio thing).

Also, while in there I tweaked the delay time computation to bring the bit-bang'd timing closer to what a user's code asks for (with the obvious caveat that there will be a max data rate that can be handled by the device).

I'm now driving a 144 led apa102 strip, bit bang'd (with DATA_RATE_MHZ(24), with the clock pulse at 13Mhz while writing out a byte (the gaps between bytes keep the overall average clock rate for a frame down a bit lower)

Let me know if you're still seeing problems on your end - i'll have my coworker's photon for a few more days, then i'll be traveling again. If it looks good to you, i'll cycle a new rev up to the web die.

Hotaman commented 8 years ago

Wow, that's some crazy shit going on! Nice work!

Better take a break and let that flat spot on your head (from banging it against the wall) go back to normal :)

I'll take a look tonight and let you know what I see on my end. First race starts in 30 min. Woohoo!

kasperkamperman commented 8 years ago

Nice work! Finally got started experimenting with my Photon. 128 leds with 16Mhz (24 didn't make a difference) get updated within 494us. Actually using the hardware SPI pins seem to slow it down with 100us (I've measured 590us). (To compare on my Teensy 3.0 update takes 300us on hardware spi).

Pretty neat speed with software SPI. I've tested the Adafruit Dotstar port and that had 800us with Hardware SPI (vs around 1600us with softspi).

focalintent commented 8 years ago

3.1.5 has been released into particle's build system - including both this fix, as well as timing fixes for large numbers of WS2812 leds off of a photon.