EdgeTX / edgetx

EdgeTX is the cutting edge open source firmware for your R/C radio
https://edgetx.org
GNU General Public License v2.0
1.63k stars 345 forks source link

FrSky Horus X10S freezes when Siyi FM30 is installed #1251

Open wucke13 opened 2 years ago

wucke13 commented 2 years ago

Describe the bug The "Failsafe not set warning" appears. It can be removed by pressing return/exit. However, from then on everything is frozen, e. g. the only possible UI interaction is switching off the remote.

To Reproduce

Alternatively, change the mode of the External RF from any mode to Multi

Expected behavior The UI stays responsive. I get it, the Siyi module is hacky and surely they committed some crimes on hijacking the Multi Protocol. However, I expect no protocol breakage of an external module to crash my transmitter.

General information

raphaelcoeffic commented 2 years ago

This going to be tricky to test without said module. Do you happen to have some testing gear? Like a logic analyser?

wucke13 commented 2 years ago

No I do not. However, I'm willing to invest in anything that works well with linux. Can you recommend a probe? Also: on OpenTX, the Siyi tells OpenTX that it speaks Siyi protocol, thus blocking all other protocols from being chosen in the GUI. In EdgeTX, this does not work at all, e. g. only the standard protocols are available.

raphaelcoeffic commented 2 years ago

@wucke13 it is probably using the “old” protocol, which only works when changing protocols one by one. For the probes, the Saleae clones for 10$ do works ok. The software from Saleae (1.29) works on all platforms.

wucke13 commented 2 years ago

Probe is bought, I'll get back here when it arrives :)

wucke13 commented 2 years ago

@raphaelcoeffic I do have the LA now. What are you interested in?

raphaelcoeffic commented 2 years ago

I do have the LA now. What are you interested in?

Great! Now please make a trace of the PPM and S.PORT pin in the external bay. Usually, this can be done by opening the module box, then connecting the module without it's enclosure, and wire-tapping these pins.

From the top, the pinout in the module bay is as follows:

You should connect the probes such that channels 1 & 2 are connected to PPM and S.PORT, while GND is connected to GND.

Then, turn the module on while recording. That should give us a fair idea of what is going on. Once we have that trace, I should even be able to simulate that here and finally fix that issue.

wucke13 commented 2 years ago

Hi, one question before I start: I got one of the cheap 24 MHz 8 Ch probes. The probe claims to be only 5.2 Volt tolerant. Isn't the voltage on the PPM pin way higher? At least on old remotes it used to be driven at battery voltage.

raphaelcoeffic commented 2 years ago

@wucke13 no worry, I’ve been doing just that for years ;-)

wucke13 commented 2 years ago

It is surprisingly difficult to get by the contacts even when the module is inserted in the bay. Using dupont jumper wires is also not easily feasible, as the males can not be inserted wide enough in the Siyi module's receptor.

In the meantime: this issue is still present in 2.6.0.

rotorman commented 2 years ago

Very often in hard to reach cases, just soldering some wires for measurement purposes might be easiest. Can you post some pics of what you are dealing with, then we can possibly guide you better how to proceed.

wucke13 commented 2 years ago

Ok, I got a trace with the issue! This is done during the trace:

  1. The external module is switched off
  2. I switch the model of the external module to Multi
  3. (it freezes)

The trace.zip is with a 1 MHz sampling rate. D2 is on the PPM/SBUS Pin of the module bay, D5 on the smartport.

Not to whoever wants to tap this devices module bay in the future:

  1. Ground and Smartport conveniently come out of the back of the device, only tapping PPM/SBUS is tricky
  2. Tapping PPM/SBUS is not tricky when the radios back is removed
rotorman commented 2 years ago

I assume this is a Sigrok trace. If I open your trace in Sigrok PulseView, the D2 line decodes with 100000 baud 8E2:

grafik

I can see only spurious peaks on the S.PORT line D5, possibly you need higher sampling rate than 1 MHz to capture signal there or the signal is damped too heavily by the hardware. Would you have a possibility to do analog trace (e.g. with an oscilloscope) of the S.PORT line?

Is the FM30 mode switch set to UART in your case? Are you running latest firmware on FM30 (you can find a link to Google drive in the following URL: https://www.rcgroups.com/forums/showthread.php?3764179-A-39-9-Radio-Module-SIYI-FM30-2-4G-30KM-OpenTX-Transmitter-with-Datalink-Bluetooth )

wucke13 commented 2 years ago

I can see only spurious peaks on the S.PORT line D5

Same story when I sample with 24 MHz, and I can not sample faster.

Would you have a possibility to do analog trace (e.g. with an oscilloscope) of the S.PORT line?

The next investment? :laughing: I do not own an osci, and it is not easy to get hold of one. But its also not impossible. I'll see what I can do.

Is the FM30 mode switch set to UART in your case?

Exactly, when I just set it to SBUS (and select SBUS on the Radio) the module works just fine.

Are you running latest firmware on FM30 (you can find a link to Google drive in the following URL

I will check that.

Edit:

Two me it looks like something from the messages sent from SIYI causes the radio to reset its internal tx part of the serial instantly, and this instant reset both freezes the UI and causes the radio to not emit proper UART data. For me, the bug here is not that the SIYI doesn't work. The bug is, that wrong telemetry data from the module can crash the radio. I will therefore not update the SIYI module, at least now I have a reproducible failure :smile:.

raphaelcoeffic commented 2 years ago

The bug is, that wrong telemetry data from the module can crash the radio.

100% agreed, but that means I need some trace of that traffic, so I can get an idea of where to look. Otherwise that means we should start fuzzing the protocol parsers....

wucke13 commented 2 years ago

Otherwise that means we should start fuzzing the protocol parsers....

Of course its always a bold move to request things from open source collaborators, but IMHO, you (or better: one) should. In the meantime I organized an old oscilloscope, which should arrive in the coming weeks. I can try to take a look at the signal from an analog perspective, once it arrived. However: D2 in the picture was from the SBUS pin. So D2 is from the Module, to the radio. D2 has a clearly visible signal. So, I would conclude that you already have a trace of the data which offends EdgeTX :smile: . I'm stupid :/.

Another interesting observation (no reproducible so far reproducible):

  1. created a model which is configured to use a multimodule
  2. enter the external mode selector, switch from Multi to DSM2, but do not confirm the selection
  3. inserted the SIYI module in the bay (in Multi module mode)
  4. skip the "failsafe not set" warning with RTN
  5. now confirm the selected external rf mode selection (DSM2)
  6. observe the radio rebooting into emergency mode

I believe that the code interfacing the module bay makes assumptions about how the module transmits data. If these assumptions are violated, a cascading error takes down the hole RTOS. I'm not sure if this is related to the initial bugreport, though. This is not reproducible with a normal multimodule.

Edit: I did some more measurements, which are quite interesting. TLDR my Logic Analyzer is only able to pickup anything significant on the SmartPort, if the SmartPort is not connected to the radio. If I connect it to the radio, the signal becomes useless. I measured the GND to SmartPort on the Radio side to be 7 kOhm, and I could not measure any capacitance between GND and SmartPort. Analog readings indeed will be very interesting. This implicates, that the crashing of EdgeTX really may be what happens when garbage (as in no clearly distinguishable UART signale) is received on the SmartPort UART. I guess the output protection resistance of the multimodule is just a little lower than that of the Siyi, thus the Siyi's signal becomes two weak. Or maybe the Siyi just really sends offending stuff? Any way, you'll also find what the Siyi sends now in the trace where I did not connect the SmartPort of the Siyi module to the radio.

new measurements.zip

raphaelcoeffic commented 2 years ago

Of course its always a bold move to request things from open source collaborators, but IMHO, you (or better: one) should.

You mean fuzzing the telemetry protocol parsers? Possibly. However, while thinking about you what you wrote about freezing UI, I have the feeling that the issue is not so much the parsers but probably some parts of the UI getting stuck on the status sent by the module. This is unfortunately not a good fuzzing target. See here for a good idea of what is a good fuzzing target: https://releases.llvm.org/11.0.1/docs/LibFuzzer.html

That being said, I think that we should, to some degree, use fuzzing tools on these parsers. It's just that I know how long it takes to produce the proper instrumentation, start data set, etc (I work in security software).

Bottom line is: we need to fall back to manual code analysis (the old fashioned way).

raphaelcoeffic commented 2 years ago

@wucke13 by the way, are you still on 2.5, or did you update to 2.6? (or maybe even using lastest build?)

wucke13 commented 2 years ago

A bit far fetched, but: how would the EdgeTX team feel to gradually include some Rust code? My reasoning would be this:

@wucke13 by the way, are you still on 2.5, or did you update to 2.6? (or maybe even using lastest build?)

I upgraded to 2.6 :smile:

gagarinlg commented 2 years ago

More tests would be a very good idea. The main issue there is, writing new stuff is fun, writing tests is, for most people, the opposite of fun.

I have no experience in Rust, but currently the code looks a lot like C code with some classes attached, so I am not sure, if the integration is really easy.

wucke13 commented 2 years ago

Integrating with C style code is just easier than CPP (no fun with this and no demangling, so less things that can go wrong), this doesn't worry me that much. Writing tests in Rust is also quite fun, IMHO. But orthogonal to these worries, there are many many other valid reasons to not do this.

raphaelcoeffic commented 2 years ago

A bit far fetched, but: how would the EdgeTX team feel to gradually include some Rust code?

Well, it's not like I haven't thought about it 🤓 Like many other things... I think I'm looking for a pretext to learn it, in fact. However, at the moment I'm more interested in reducing the patchwork than increasing it.

Ok, just to be sure: the exact same issue can be reproduced with 2.6, right?

wucke13 commented 2 years ago

Well, it's not like I haven't thought about it nerd_face Like many other things... I think I'm looking for a pretext to learn it, in fact. However, at the moment I'm more interested in reducing the patchwork than increasing it.

While I obviously think this is a great idea and will solve all some problems, I can totally understand this reasoning (and probably, that is the right decision from an engineering perspective). However, if you ever want to consider this, I invite you to involve me, I'm happy to provide guidance opinion on the matter. But now its time to think about different things...

Ok, just to be sure: the exact same issue can be reproduced with 2.6, right?

Yes, everything I did today (so everything mentioned in my comments today) I did on 2.6.0 (338d4d2c)

c608345 commented 2 years ago

It's easy to request rust support. It's hard to get it done. Rust pull request is appreciated.

wucke13 commented 2 years ago

@c608345 Fair :smile: What would be a good, simple, not to strongly coupled thing that needs implementation, possibly in Rust? Which channel in the discord would be best to discuss these matters?

gagarinlg commented 2 years ago

The Dev channel would be the right place to talk.

pfeerick commented 2 years ago

That is, the #dev channel on Discord ;) Discord

gagarinlg commented 2 years ago

Finding the right discord server was the first test and you spoiled it 😜