dbuezas / lgt8fx

Board Package for Logic Green LGT8F328P LGT8F328D and LGT8F88D
361 stars 89 forks source link

SoftwareSerial conflict with other lib that disables interrupts #28

Closed seisfeld closed 3 years ago

seisfeld commented 4 years ago

Hi,

the new implementation of the SoftwareSerial library introduced by @jg1uaa in v1.0.6 via #26 seems not reliable. I updated my core this morning and ran our application with it. We use this library to talk to an MP3 decoder chip at 9600bps using SoftwareSerial. We run the LGT at 16 MHz using an external clock. The mentioned library and the chip uses checksumming to make sure the communication is reliable. With v1.0.6 of the core we get packet header and packet checksum errors which we don't get with v1.0.5.

jg1uaa commented 4 years ago

which side does not work correctly, LGT->MP3 chip (Tx) or MP3 chip->LGT (Rx), or both? In LGT8Fx some instructions are faster than original AVR chip, so we have to reconstruct the formula of _tx_delay, _rx_delay_centering, _rx_delay_intrabit and _rx_delay_stopbit. But it looks not easy.

And these formula is affected by version of GCC. My Arduino IDE (1.8.13) uses AVR-GCC 7.3.0, please tell me if the version is different.

seisfeld commented 4 years ago

The messages I see are from MP3 chip->LGT (Rx). Unfortunately have no insight in the LGT->MP3 chip (Tx) direction. I am using Arduino IDE (1.8.13) on macOS. AVR-GCC is:

/Applications/Arduino.app/Contents/Java/hardware/tools/avr/bin/avr-gcc --version avr-gcc (GCC) 7.3.0 Copyright (C) 2017 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

seisfeld commented 4 years ago

While trying various things to debug the issue I noticed the following warning during compiling:

/Users/stephan/Library/Arduino15/packages/LGT8fx Boards/hardware/avr/1.0.6/libraries/SoftwareSerial/SoftwareSerial.cpp:57:59: warning: cannot declare member function 'static void SoftwareSerial::_delay_loop_2(uint16_t)' to have static linkage [-fpermissive]
 static void SoftwareSerial::_delay_loop_2(uint16_t __count)
                                                           ^

@dbuezas gave me quick fix for that which looks like this:

diff backup/SoftwareSerial.cpp SoftwareSerial.cpp
57c57
< static void SoftwareSerial::_delay_loop_2(uint16_t __count)
---
> inline void SoftwareSerial::_delay_loop_2(uint16_t __count)

and

diff backup/SoftwareSerial.h SoftwareSerial.h
88c88
<   static void _delay_loop_2(uint16_t __count);
---
>   static inline void _delay_loop_2(uint16_t __count);

This fixes the warning but unfortunately it does not fix the issue.

dbuezas commented 4 years ago

I made some tests at 9600 bauds with using the internal 16MHz clock:

  1. Send via SoftwareSerial to another board which echoes back with its own hardware Serial. -> No Rx or Tx errors for 3 continuous hours.
  2. Test Tx: Measure with the oscilloscope the width of each bit sent via Software Tx. Compared with HW serial -> Timing looks perfect, 104us per bit (1s/104,166us = 9600 bauds )
  3. Test Rx: Plot the signal from a trusted HW Serial (Atmel based nano board) and on top of that I made the LGT328p board toggle a pin when is reading each bit -> It reads them in the middle, exactly where it should.

image

Yellow -> True 9600 baud signal sent by an Arduino Nano Pink -> Toggles when it is about to read a bit. It toggles is in the middle of the square signal Green -> Wait to get out of the stop bit.

I can't find any problem, everything looks absolutely perfect I'm really mystified by this. You are 100% sure you selected the right clock speed for your external oscillator, right?

seisfeld commented 4 years ago

You are 100% sure you selected the right clock speed for your external oscillator, right?

Yes. 100%. I even tried it with the internal clock. Both cases 16MHz. I never use 32MHz to be most compatible to the Arduino world.

seisfeld commented 4 years ago

I think I found the issue. The new implementation of SoftwareSerial does not like it when you frequently disable interrupts. We do this to drive WS2811 LEDs. When I disable that part in the code the new SoftwareSerial works fine. I don't think there is much one can do except to stick with the old SoftwareSerial right?

Never the less, I still suggest to merge the change by @dbuezas mentioned in https://github.com/dbuezas/lgt8fx/issues/28#issuecomment-661271278 to get rid of the warning when compiling, if this has no other side effects.

What else can we do to keep the old SoftwareSerial around? It seems to be a bit more robust when it comes to disabled interrupts.

dbuezas commented 4 years ago

That's quite interesting. It does explain why only some packages were corrupted (i.e. only when interrupts were disabled during communication)

I couldn't find any obvious change in the code that would make it less tolerant to delays. Maybe the old version was a bit too fast and that compensated for it?

Here's an quick experiment: Try some baud rates between 9600 and 10181. (e.g. 9700, 9800, 9900, 10000 and 10100) One nice thing about the new lib is that one can chose any arbitrary baud rate and the delays are computed on the fly :)

Reasoning: 1) 1 second / 9600 bauds = 104.16us per bit 2) 8 bits + 1 stop bit per packet = 9 bits per packet 3) In 9 bits less than half pulse of delay will still work = -52.08us max of accumulated offset. 4) -52.08/9 = -5.78 us offset per bit (maximum) 5) 104-5.78 = 98.22us pulse width (minimum) 6) 1s / 98.22 us = 10181.22 bauds (maximum)

I fine tuned the old library by finding min and max delays that worked and then taking the value in the middle. This probably included some slack ~generated by the millis timer~ (nah, that's too fast) so it was very likely a bit faster that what it should have been. Disabling interrupts for some time in your code may have added enough delay to the new library for it to fall over the right edge. (i.e, the the last knee of the pink line would be to the right of the right knee of the yellow line in the oscilloscope pic above).

I bet that your code is disabling the interrupts for more than 52.08 us. :)

jg1uaa commented 4 years ago

sorry to @seisfeld please give me a time to fix.

I am disassembling the object of Software.cpp and it is quite different between gcc-4.8.x and gcc-7.3.0. Maybe LTO (link-time-optimization) makes thing worse. I think some values about timing needs to be tuned, but I have to know how to determine them. Measuring clocks between delay_loop()s? or writing GPIO register? which? And clocks of CALL/RET instructioins should not be ignored (original code does not consider)... if there is enough storage, delay_loop() will be inline.

LaZsolt commented 4 years ago

LTO also has effect on delayMicroseconds() https://github.com/MCUdude/MiniCore/issues/130#issuecomment-623488286

Anyway WS2811 LED using a special serial communication. I think it is some kind of special software serial. So strongly recommended to avoid receive any data with SoftwareSerial while sending data to LEDs.

seisfeld commented 4 years ago

sorry to @seisfeld please give me a time to fix.

@jg1uaa, mate, no problem. We appreciate any help and input. :)

Anyway WS2811 LED using a special serial communication. I think it is some kind of special software serial.

@LaZsolt We use this library. I had to tweak some values (loop cycles) to accomodate for the faster execution time of certain instructions. We only use 6 LEDs and and run the LGT at 16MHz, which works fine with the mentioned modifications. I recon you guys have fairly good skills in assembler, maybe you know how to really fix the library for the LGT. It is very small. Would really appreciate it. 😇

So strongly recommended to avoid receive any data with SoftwareSerial while sending data to LEDs.

I will try to reduce the amount of updates of the LEDs. Maybe that improves the situation a bit.

jg1uaa commented 3 years ago

LGT8x's push/pop function is faster than AVR's. I think _rx_delay_centering value for AVR is not suitable for LGT. Here is calculated clock (branch taken/not taken) with new delay code. LGT is less clock so we have to increase _rx_delay_centering.

To check my calculate method is appropriate or not, I put the result at https://pastebin.com/raw/HnKQwr8L (this is very long and difficult to paste here).

Numbers under AVR and LGT are clocks. and the case of branch taken/not taken.

(postscript) testing code diff is at https://pastebin.com/EiEeuPKa simply coded and I checked compile passed. communication test is not yet.

dbuezas commented 3 years ago

@jg1uaa That's some serious digging you did there! :) Push and pull happen only at the beginning of the interrupt though, so this may account for some fraction of a microsecond, it should really make no difference at these speeds.

If you look at the oscilloscope pic I uploaded here, you can see that the timings are actually really really good.

I tested the old lib with the oscilloscope and guess what: it was barely working at all! It was at the absolute limit, but in the other direction than I thought. This doesn't explain why you were not getting CRC errors before, but maybe the packets were discarded all together.

OLD library

jg1uaa commented 3 years ago

I enabled debugging option by #define _DEBUG 1 with current code and disassembled the object. DebugPulse() is declared as inline void but it is compiled as a normal function. Here is disassembly. The number before address means LG8Fx instruction cycle.

      00000690 <__vector_3>:
1      732:   88 8d           ldd     r24, Y+24       ; 0x18
1      734:   99 8d           ldd     r25, Y+25       ; 0x19
1      736:   01 97           sbiw    r24, 0x01       ; 1
1/2    738:   f1 f7           brne    .-4             ; 0x736 <__vector_3+0xa6>
1      73a:   00 00           nop
1      73c:   e1 2e           mov     r14, r17
1      73e:   f1 2c           mov     r15, r1
1      740:   f5 94           asr     r15
1      742:   e7 94           ror     r14
1      744:   1e 2d           mov     r17, r14
1      746:   88 e0           ldi     r24, 0x08       ; 8
4      748:   0e 94 98 00     call    0x130   ; 0x130 <_Z10DebugPulsehh.constprop.3>
1      74c:   ee 85           ldd     r30, Y+14       ; 0x0e
1      74e:   ff 85           ldd     r31, Y+15       ; 0x0f
1      750:   80 81           ld      r24, Z
1      752:   9d 85           ldd     r25, Y+13       ; 0x0d
1      754:   89 23           and     r24, r25
1/2    756:   09 f0           breq    .+2             ; 0x75a <__vector_3+0xca>
1      758:   10 68           ori     r17, 0x80       ; 128
1      75a:   01 50           subi    r16, 0x01       ; 1
1/2    75c:   51 f7           brne    .-44            ; 0x732 <__vector_3+0xa2>

DebugPulse() is modified like this (single pulse is hard to view with oscilloscope).

    *pport = val ^ digitalPinToBitMask(pin);
//  *pport = val;

And DebugPulse() object takes 26 + 4 (CALL overhead) cycles.

      00000130 <_Z10DebugPulsehh.constprop.3>:
1      130:   90 e0           ldi     r25, 0x00       ; 0
1      132:   fc 01           movw    r30, r24
1      134:   ee 55           subi    r30, 0x5E       ; 94
1      136:   ff 4f           sbci    r31, 0xFF       ; 255
2      138:   e4 91           lpm     r30, Z
1      13a:   f0 e0           ldi     r31, 0x00       ; 0
1      13c:   ee 0f           add     r30, r30
1      13e:   ff 1f           adc     r31, r31
1      140:   ea 56           subi    r30, 0x6A       ; 106
1      142:   ff 4f           sbci    r31, 0xFF       ; 255
2      144:   a5 91           lpm     r26, Z+
2      146:   b4 91           lpm     r27, Z
1      148:   2c 91           ld      r18, X
1      14a:   83 54           subi    r24, 0x43       ; 67
1      14c:   9f 4f           sbci    r25, 0xFF       ; 255
1      14e:   fc 01           movw    r30, r24
2      150:   84 91           lpm     r24, Z
1      152:   82 27           eor     r24, r18
1      154:   8c 93           st      X, r24
2      156:   08 95           ret

This is called 10 times (1 start bit, 8bit data and 1 stop bit) so 300 clocks are extra. I think enough delay is observed between debug pulse and stop bit regardless new/old code.

jg1uaa commented 3 years ago

Now timing-modified version of SoftwareSerial.cpp is now on my repo, https://github.com/jg1uaa/lgt8fx/tree/softwareserial/lgt8f/libraries/SoftwareSerial For detailes, please read commit message. I hope this code resolves @seisfeld 's problem.

BTW, this code is for gcc-7.3.0 only (Arduino-1.8.10 or later). Do I have to support gcc-5.4.0 (Arduino-1.8.6 to 1.8.9) and gcc-4.9.2 (other Arduino-1.8.x series)?

seisfeld commented 3 years ago

Unfortunately it does not. Anyways I think what we know by now is, that the issue is not with the timing. As @dbuezas showed in https://github.com/dbuezas/lgt8fx/issues/28#issuecomment-661284475 the signal is almost perfect. If your recent change even improves it, by accounting for the gcc 7.3.0, even better. Maybe @dbuezas can check that version with a scope. Before and if this gets merged I suggest the warning mentioned in https://github.com/dbuezas/lgt8fx/issues/28#issuecomment-661271278 gets fixed. This is still there.

As for my problem, I am almost sure this is related to disabling interrupts by the LED library. A different library has this fact even documented: FastLED:Interrupt-problems So what I'll do next is to try to reduce the updates to the LEDs to improve the situation. If that does not work I can sill use the old library - even if it is very broken as per https://github.com/dbuezas/lgt8fx/issues/28#issuecomment-662947902 regarding the signal - it at least worked for us. 😉

dbuezas commented 3 years ago

Now timing-modified version of SoftwareSerial.cpp is now on my repo, https://github.com/jg1uaa/lgt8fx/tree/softwareserial/lgt8f/libraries/SoftwareSerial

@jg1uaa I did some extensive tests with your new timings with the oscilloscope, they look just perfect. It can't communicate anymore at 230400 with the Hardware Serial of a nano, but I measured their timings and they are wrong (4.5us per bits), your timings are indeed correct (4,34us). The real bit length is of course 4,34us (1000000us/230400bauds)

I already took your code to test it in this branch: https://github.com/dbuezas/lgt8fx/tree/new-software-serial-timings Can you make another pull request? Or if you don't mind I can merge that one. If you want put your name in the file as the author of the new timings :)

I also changed the way the pulses are generated and compensated for them in the timings by measuring it experimentally.

jg1uaa commented 3 years ago

Thanks, I sent pull request to new-software-serial-timings branch.

But I am worrying for old Ardiuno IDE users (gcc-5.4.0 and gcc-4.9.2). There is a few but exists due to something special reason. My method of counting instruction cycle seems to be okay, so adding older version compiler support is not impossible (but take time to work and test, test is very hard...).

jg1uaa commented 3 years ago

sorry, this is completely off-topic and following comment is about light_ws1812. I read https://github.com/cpldcpu/light_ws2812/blob/master/light_ws2812_AVR/Light_WS2812/light_ws2812.c and I found this code:

#define w_nop1  "nop      \n\t"
#define w_nop2  "rjmp .+0 \n\t"
#define w_nop4  w_nop2 w_nop2
#define w_nop8  w_nop4 w_nop4
#define w_nop16 w_nop8 w_nop8

maybe w_nopX takes X clocks delay. Original AVR takes 2 clocks for RJMP instruction but LFT8Fx takes only 1 clock. (see https://github.com/RalphBacon/LGT8F328P-Arduino-Clone-Chip-ATMega328P/blob/master/Comparison%20of%20instruction%20Efficiency.pdf for details) So, I think this library produces sometimes unwanted result due to timing issue... (or might be work corretly as high-speed, not low-speed mode?)

And I read WS1811 data sheet (http://akizukidenshi.com/download/ds/worldsemi/WS2811.pdf) and this device requires very critical signal timing indeed. I don't have any WS281x device so I cannot test these device/library, but I am worried about the library works correctly on LGT8Fx.

it's time to return to SoftwareSerial discussion...

seisfeld commented 3 years ago

@jg1uaa Your observation is correct. The library needs some modification. It works for a hand full of LEDs at 16MHz when you change the cycles to the following values:

// Fixed cycles used by the inner loop

define w_fixedlow 2

define w_fixedhigh 5

define w_fixedtotal 8

Unfortunately this is no general fix. Wondering if you could help with your knowledge? I agree though, this is off-topic here. I can drop you an email if this is okay?

dbuezas commented 3 years ago

@jg1uaa great! I'll merge when Im back home next week. Btw, did you see the commit that fixes the warnings? Are you ok with that? (Only declaring it static in the h file

On the smart led topic: I'm sure other users of the board will be thankful if you discuss it in another issue instead of privately via emails :)

jg1uaa commented 3 years ago

@dbuezas sorry, fixed. @seisfeld please give me a time to consider. (I just forked light_ws2812 to my repo, but nothing have done. need time to read code and modify.)

seisfeld commented 3 years ago

@seisfeld please give me a time to consider. (I just forked light_ws2812 to my repo, but nothing have done. need time to read code and modify.)

No worries mate.

On the smart led topic: I'm sure other users of the board will be thankful if you discuss it in another issue instead of privately via emails :)

Sure sure, was not aware @jg1uaa did fork the lib. We can fiddle with it in an issue on his repo.

jg1uaa commented 3 years ago

@seisfeld only this file need to be modified, and continue discussion about light_ws2812 at https://github.com/jg1uaa/light_ws2812/commit/1c10413c11ad6291abbf0e637726c076943fea5d

jg1uaa commented 3 years ago

@seisfeld how works new SoftwareSerial with improved version of light_ws2812?

seisfeld commented 3 years ago

@jg1uaa new SoftwareSerial works and so does the improved version of light_ws2812. The "problem" if you want to call it like that is not related to either of these. It's more of a general issue in how ws2812 protocol is implemented in any of the libraries which do bit banging. Certain time critical things run with interrupts disabled which in turn makes other libraries (in this case SoftwareSerial) loose data because certain timers are affected etc. (but whom am I explaining that here, you do know ;-)).

IMHO there is not much that can be done except trying not to update the LEDs too frequent (which is what I do now and have quite good results). On other MCUs there is more headroom to re-enable interrupts between updating etc. (FastLED does this for example). But on Arduino or LGT I don't think its a feasible way, let alone I would have no clue how to optimize any library in that direction.

Overall I think we made very good progress in getting SoftwareSerial WAY more solid and accurate as well as getting light_ws2812 more stable. So thanks again to everyone involved. Really appreciate all the efforts taken here!

jg1uaa commented 3 years ago

@seisfeld Ok I'm glad to hear that your project goes well. Indeed SoftwareSerial or other software-timer-based code is sometimes sensitive with other codes. Thanks to make a chance to try AVR assembler.

@dbuezas well, the problem seems to be solved. is it the time to close this issue?