EdgeTX / edgetx

EdgeTX is the cutting edge open source firmware for your R/C radio
https://edgetx.org
GNU General Public License v2.0
1.59k stars 337 forks source link

Corrupt data using SPort soft serial for external modules #3285

Closed mha1 closed 1 year ago

mha1 commented 1 year ago

Is there an existing issue for this problem?

What part of EdgeTX is the focus of this bug?

Transmitter firmware

Current Behavior

Working with an external module using SPort serial input for telemetry I noticed unexpected glitches.

The glitches are caused by soft serial sampling. To prove this claim I have a simple test setup which uses an existing external module protocol using SPort telemetry (LemonRX DSMP - physical module not required) and feeding arbitrary data to the SPort Pin. The data is read back using telemetry mirror and compared to the fed in data.

Note: 2.8.1 works fine, no data corruption

Test setup:

TX16s set External Module to mode LemonRX DSMP set AUX1 to Telem Mirror connect FTDI GND to module bay GND connect FTDI TX to module bay SPort pin connect FTDI RX to AUX1 TX

Feed a 22 byte data frame at 115200 every 100ms, e.g. using HTerm (ASend 0,1) to the module bay SPort pin: 02 01 1B 22 1B 23 04 05 06 46 3A 5B 00 00 6E 01 14 15 16 17 18 03

Log of Telemetry Mirror output:

02011B221B23040506463A5B00006E01141516171803
02011B221B23040506463A5B00006E01141516171803
02011B221B23040506463A5B00006E01141516171803
02011B221B23040506463A5B00006E01141516171803
02011B221B23040506463A5B00006E01141516171803
02011B221B23040506463A5B00006E01141516171803
02011B221B23040506463A5B00006E01141516171803
02011B221B23040506463A5B00006E01141516171803
02011B221B23040506463A5B00006E01141516171803
02011B221B23040506463A5B00006E01141516171803
02011B221B23040506469D2B006E01141516171803       <-- Data corrupt, frame length not ok, byte missing
02011B221B23040506463A5B00006E01141516171803
02011B221B23040506463A5B00006E01141516171803
02011B221B23040506463A5B00006E01141516171803
02011B221B23040506463A5B00006E01141516178CE0     <-- Data corrupt, frame length ok
02011B221B23040506463A5B00006E01141516171803
02011B221B23040506463A5B00006E01141516171803
02011B221B23040506463A5B00006E01141516171803
02011B221B23040506463A5B00006E01141516171803
02011B221B23040506463A5B00006E01141516171803
02011B221B23040506463A5B00006E01141516171803
02011B221B23040506463A5B00006E01141516171803
02011B221B23040506463A5B00006E01141516171803
02011B221B23040506463A5B00006E01141516171803
02011B221B23040506463A5B00006E01141516171803
02011B221B23040506463A5B00006E01141516171803
02011B221B23040506463A5B00006E01141516171803
02011B221B23040506463A5B00006E01141516171803
02011B221B23040506463A5B00006E01141516171803
02011B221B23040506463A5B00006E01141516171803
02011B221B2304829094DA01006E01141516171803        <-- Data corrupt, frame length not ok, byte missing
02011B221B23040506463A5B00006E01141516171803
02011B221B23040506463A5B00006E01141516171803

Expected Behavior

No data corruption using module bay Sport pin.

Steps To Reproduce

see above

Version

2.9 latest nightly, been in there since the major driver and ports overhaul

Transmitter

Radiomaster TX16S / TX16SMK2

Anything else?

No response

mha1 commented 1 year ago

First of all I was wrong tagging this as a 2.8.1 problem. It's a main problem tested with the latest nightly and I think got introduced with the module ports renovation. 2.8.1 is working fine.

Back to the problem I suspected the sampling might be off but actually looks pretty good. The start bit is detected after 500ns (first pulse generated at entry into the falling edge ISR static void _softserial_exti()) and the following 8 samples (pulses generated in the timer ISR at the point where the GPIO is read - void stm32_softserial_rx_timer_isr()) are positioned right at half the bit time. So the soft serial driver is working fine. But even with this perfect sampling example sporadic data corruption occurs. Most likely some higher priority interrupt hitting in between bit samples throwing the timing off.

@pfeerick @raphaelcoeffic any ideas?

image

PS: With a little patience I caught one

image

The first UART line is the feed in at the SPort pin, the second line is telemetry mirror output. Note the first 4 bytes. In: 02 55 AA 22, Out: 02 55 AA 91. The 22 was not sampled correctly. Check out the 9,75us delay instead of the usual 500ns delay detecting the falling edge of the start bit. We missed an entire bit and everything else after that goes wrong because we are one bit out of sequence. It's gotta be a high priority interrupt that delayed firing the the falling edge detection ISR.

image

If you want to see for yourself please use the attached DSLogic data file. Can be opened in DSView which is available free of charge. DSLogic-la-230301-211300.zip https://www.dreamsourcelab.com/download/

richardclli commented 1 year ago

I think it is the change of IRQ priority, old code uses 0 for both EXTI and TIM irq priority, but new code did not.

mha1 commented 1 year ago

Yes, I changed the falling edge detection IRQ prio from 5 to 0 for target Horus a few days ago after talking with @raphaelcoeffic. This solved the problem for Horus type radios. Tested on TX16s and X10express ok.

Not sure about the other targets Taranis and NV14. Need feedback from @raphaelcoeffic

richardclli commented 1 year ago

Yes, I changed the falling edge detection IRQ prio from 5 to 0 for target Horus a few days ago after talking with @raphaelcoeffic. This solved the problem for Horus type radios. Tested on TX16s and X10express ok.

Not sure about the other targets Taranis and NV14. Need feedback from @raphaelcoeffic

Already replied you in your PR, please go ahead to fix taranis target as well.

mha1 commented 1 year ago

I'd like to do some testing on this. Do you have a Taranis type radio and a FTDI?

richardclli commented 1 year ago

I do not have a Taranis, need somebody have it to test anyway, however it didn't blocks you from changing the taranis as well in your PR. The old code uses 0 priority as well.

mha1 commented 1 year ago

@pfeerick I'm sure you have a Taranis type radio and a FTIDI, haven't you?

pfeerick commented 1 year ago

I plead the fifth.

mha1 commented 1 year ago

Overruled, there is no 5th in Australia

mha1 commented 1 year ago

Let me know the Cmake for your radio and I'll send you 2.9 firmware to test soft serial and a corresponding Windows command line tool to stimulate soft serial via the SPort pin.

pfeerick commented 1 year ago

Welp... tough crowd! 😮

The X9D+2019 will be more reliable atm, so -DPCB=X9D+ -DPCBREV=2019 . If you want to risk the X9D+ as well, just -DPCB=X9D+ ... I'll pencil you in for Sunday 🤣

mha1 commented 1 year ago

Thanks for your precious time. Attached is a .zip with test firmware for X9D+2019 and X9D+ with changed IRQ priority for soft serial, the Windows command line tool and notes.txt. Please read notes.txt on how to set this up. SoftSTest.zip

As a reference the same build for TX16s which tested ok on my TX16s with the setup described in notes.txt fw-tx16s_softS-1.zip

pfeerick commented 1 year ago

For the X9D+2019, when switching between 2.8.1 and your build, this seemed to have no change in behaviour - all sensors, but rolling values for some. For the X9D+, 2.8.1 => your build, this went from not discovering any sensors, to discovering sensors, but as with the X9D+2019, rolling values for some. So, interestingly something seems to have been wrong in 2.8.1 for the X9D+

So in my view this is still an improvement in it's own right, but not 100% resolved? It certainly isn't a step backwards, so there is no need for you to invest in a Etch A Sketch just yet 🤭 Assuming, of course, the main sport simulator code was behaving properly? I switched between two computers and a couple different USB/serial adapters with no change in behaviour... and then tried the TX16S build and it is behaving exactly the same as the other radios... so at least they are all consistent! 😇 (edit: tester error, need to retry)

X9D+2019 https://user-images.githubusercontent.com/5500713/224583372-7875d742-8416-4eba-b166-d05a8e58aeba.mp4

X9D+ https://user-images.githubusercontent.com/5500713/224583834-a727f280-97e9-4105-b62f-5ba42932ce59.mp4

X9D+ (including discovery) https://drive.google.com/file/d/1fXN3DTSKVQiXEXMEi10cixNTw0iRVr3-/view?usp=share_link

TX16S https://drive.google.com/file/d/1DRuc0JB5cAbF5AsQhPfcaUsN9Vdm4CVj/view?usp=share_link

mha1 commented 1 year ago

Thank you very much for testing.

Can you please verify the fw versions for your comparisons 2.8.1 -> my self build. There is no way you'd be able to set the external module to subtype MLink (PPM MLINK) in 2.8.1 for any radio. This option of selecting a subtype only exists in my self build. And please make sure after my self build is flashed you go into settings/external module to select PPM MLINK. If that's the case you should see only the sensors discovered with main.exe running I listed in notes.txt

Flashing 2.8.1 and selecting PPM in settings/ external module (no subtype selection possible) might show some sensors after discovery with main.exe running but sensors should be different to the 10 as listed in notes.txt

mha1 commented 1 year ago

I should have explained this better: my self build implements two major differences to 2.8.1 and 2.9. In 2.8.1/2.9 you can only select PPM as external module type without further specifying the telemetry protocol that is fed to the SPort pin. 2.8.1 is special here as by default it interprets any serial data coming in through the SPort pin as FrSky SPort telemetry. The switch to the new serial/module API in 2.9 disabled this feature. So if you'd try a recent 2.9 nightly you wouldn't be able to discover any telemetry sensors with a PPM module. I think it was never deliberately chosen to decode SPort telemetry with PPM. The port was just open. Fact is 2.9 after the serial/module API has disabled decoding SPort telemetry which I think is correct.

My change uses 2.9 and starts fresh with allowing deliberate telemetry protocol selection for module type PPM. I started with implementing a new protocol for a Multiplex PPM JR type module that is able to deliver serial telemetry too. Main.exe simulates this protocol, not SPort telemetry. Feeding main.exe's data to a 2.8.1 radio will produce random results as 2.8.1 tries to decode this as SPort telemetry (this is what your videos show). Having my self build flashed and having set the external module to PPM NOTLM should show no discovered sensors, setting it to PPM MLINK should shown exactly the 10 sensors I listed in notes.txt. Make sure you delete all sensors and rediscover sensors after switching firmware versions.

While implementing based on a recent 2.9 versions I found this serial problem which has it's roots in the IRQ priority change that came with the big serial/module API renovation. This is also corrected in my self build and is the test focus.

The major changes in my self build vs 2.9: UI: it is now possible to select a sub type for external modules (NOTLM and MLINK) which allows to select a specific telemetry protocol for a PPM type module. Telemetry: implemented a new protocol to decode Multiplex JR module telemetry (subtype MLINK)

Expected results: 2.8.1:

2.9 current nightly (any after the serial/module API renovation)

2.9. Michael's self build:

So if you really wanted to do a comparison you should compare a 2.9 nightly build after Raphael's big change to my self build.

pfeerick commented 1 year ago

Ok, this monkey didn't read the instructions properly... I still had Lemon DSMP in my head when I was trying this, so had the module type set to that... regardless... on this X9D+ something changed between 2.8.1 and your build for the better (even with the wrong setting 😆) ... which is actually no surprise given it was playing up before but another user couldn't repo it... so will try again and will put my glasses on this time 😇

mha1 commented 1 year ago

No problem. Knowing I am good at confusing people, I should have explained the context much better. Here's my new attempt to confuse you but with good intentions: to save your time 😇

  1. Why did I choose such a strange setup: a. Because I wanted to make sure soft serial is actually used to verify the solution to this issue. Using SPort telemetry doesn't use soft serial as it serves inverted serial data and therefore can use the HW UART together with the hardware inverters on the PCB (damn you FrSky). The thing I implemented in my branch feeds a new normal polarity telemetry protocol to the SPort pin and uses (has to) soft serial to process the data. Main.exe simulates the new protocol and hence only provides meaningful data for my new protocol decoder. Feeding main.exe to any other protocol generates garbage at best. b. Main reason: because I detected and analyzed the soft serial problem with the exact same setup I then used for verification too on my TX16s.

  2. Why does comparing to 2.8.1 not make a lot of sense: a. 2.8.1 has a special quirk that if you select PPM as external module, the SPort pin is active by default and will decode everything you feed it as SPort protocol. So feeding 2.8.1 produces garbage because main.exe doesn't deliver SPort data b. The problem with soft serial is a 2.9 problem. 2.9 soft serial was working fine until the big serial/module API do-over. New soft serial driver plus changes to IRQ priorities made soft serial unusable due to the start bit detection interrupt having a to low priority, see above analysis c. Btw, the quirk described in 1a. also vanished after the re-do. Which I think is good, because if an external module type has the capability to process more than one protocol it should be user selectable which one to choose. Do you know of any external PPM module that delivers SPort telemetry?

  3. Does make comparing a recent nightly build to this setup make sense: a. Only in so far as to verify there is no more SPort data decoded in 2.9 for PPM type external modules, see 2.c. But this setup doesn't provide the means for that as main.exe doesn't stimulate SPort telemetry b. Not as far as this issue is concerned due to section 1. If you want I can send versions of this setup with the current main branch IRQ priority. I omitted this because I went through the exercise for the Horus targets (on my TX16s).

  4. What's the real test case here? a. The test case is to check if the change of the start bit detection interrupt stops soft serial from providing corrupt data. b. A check with X9D+softS.bin and X9D+2019softS.bin is sufficient enough to prove the IRQ change is working for Taranis targets

  5. Whats the test procedure and the expected result: a. See the following video that was done with my TX16s, using fw-tx16s-sofS.bin and main.exe. If the result is the same using the same procedure as in the video on both X9D+ radios I'm happy and know the IRQ priority change works for Taranis targets too (I can confirm this for TX16s, see video). If the IRQ priority wouldn't work you'll see ghost sensor, sensors going on/off and so on.

https://www.youtube.com/watch?v=Yx7RB6kY2KA

pfeerick commented 1 year ago

Here's my new attempt to confuse you but with good intentions

You'll have to try harder - that made sense 🤭 TX16, X9D+2019 and X9D+ with the correct protocol setting worked perfectly this time...

https://drive.google.com/file/d/1DmCd00J95WnSCzcN8PxjrbEeEyJ3IrcZ/view?usp=sharing, https://drive.google.com/file/d/1DomngIV4dqANCyqNpFkiBLNvCyORBmwJ/view?usp=sharing, https://drive.google.com/file/d/1DpgTSfdR9y2mtgSqhYaa_ANM_i9Hg-hz/view?usp=sharing

mha1 commented 1 year ago

Hi Peter, thanks again for your time and patience. I have updated PR https://github.com/EdgeTX/edgetx/issues/3285 accordingly and think it's ready to go.

You actually tested quite a bit more than just the fix for this issue. The test setup and firmware files you used for testing fully reflect PR https://github.com/EdgeTX/edgetx/pull/3352 (Support for external Multiplex MLink JR type module) and PR https://github.com/EdgeTX/edgetx/pull/3328 (fix: UI - not all detected sensors displayed at detection time). I'd appreciate you checking out both PRs and hopefully green light them too.