EdgeTX / edgetx

EdgeTX is the cutting edge open source firmware for your R/C radio
https://edgetx.org
GNU General Public License v2.0
1.59k stars 338 forks source link

FrSky X10S Express: dangerous spike on channel associated with RV stick using ELRS 900 MHz modules on EdgeTX V2.11 (main branch) #5613

Open robustini opened 1 week ago

robustini commented 1 week ago

Is there an existing issue for this problem?

What part of EdgeTX is the focus of this bug?

Transmitter firmware

Current Behavior

Unfortunately this is not at all nice, having channel spikes going to full scale on RV stick means for the pilot in MODE1 to have the throttle going to full randomly. Fortunately we have ascertained that this only happens using firmware compiled from the main branch, while version 2.10.x is not affected for now. Tests were performed with two FrSky Horus X10S Express and EMAX Aeris, FrSky R9M and Radiomaster Nomad modules at 900 MHz with ELRS V3.5.1. No problems were encountered with any 2.4 GHz ELRS module tested here (BetaFPV and EMAX). Definitely related to an RF problem that apparently exist with 2.11, so some commit emphasized this anomaly. In my case even at 10 mW power of the R9M module the problem is evident, with the Nomad it appears with much higher power. Some videos showing the problem:

https://github.com/user-attachments/assets/c5fed020-f9bf-4ba3-8dff-94e7c14aab09

https://github.com/user-attachments/assets/312a2fec-c903-4713-8a9c-318a4a6ec84e

Expected Behavior

Obviously the analog hall sensor on RV maybe has something at the connectivity level on the main board or something else that makes it more sensitive to RF noise but the fact remains that with EdgeTX V2.10.x this thing does not happen. I have tried swapping the two FrSky M10 gimbals and the problem still remains on RV.

Steps To Reproduce

The posted videos show the procedure, in my case CH2 starts to show spikes only when the rx and module start to colloquy, so when RF comes into play. The problem is highlighted by using in ELRS 16 channels with telemetry ratio at 1:2.

Version

Nightly (Please give date/commit below)

SHA 34856136c9eb54ca7fcb0b9d547c485bb9972841

Transmitter

FrSky X10 Express / X10S Express (ACCESS)

3djc commented 1 week ago

I don't think you have proven in any way that a problem exist in 2.11. You have proven through your research that you have hardware based RF issue, that's the bottom line. They do manifest in 2.11 (more ? Given there are plenty of elrs settings where this doesn't happen, it cannot be excluded that some exist in earlier version that you haven't seen, it all a question of timing), maybe so, likely due to timing between radio processing and RF perturbation, it still doesn't make it a software issue warranting software fixing.

The only true fixing need to be a to find how to shield your radio against 900MHz RF spikes, imho

robustini commented 1 week ago

@3djc certainly on the shielding problem I can agree with you, this TX apparently in that section has been poorly maintained and there is a hardware problem. But if the software version emphasizes it a minimum of investigation should be done in my opinion, i mean definitely on the hardware but also on the software. If then it is not possible we will change TX or stay at 2.10, problem not solved but curbed. My issue is to prevent someone from one day tomorrow going the throttle of a plane to full throttle unintentionally on the ground, and perhaps with hands on the propeller.

robustini commented 1 week ago

I repeated the same tests on Radiomaster TX16S and FrSky X12S and no problems found. So absolutely agree that the X10S bickers with the RF of the ELRS modules at 900 MHz. But curiosity remains: what in 2.11 makes this thing pop up? Why is it absolutely stable with 2.10? Is it possible to fix it via software? If we can't via software how to solve it at the hardware level? I am not an ostrich who sticks my head in the sand when faced with a problem, if possible I look for a solution (if there is a possible solution).

pfeerick commented 1 week ago

If this is a RF issue, which it entirely appears to be, the only way to resolve it - other than using a magic version of firmware that just happens to with that exact hardware layout, environmental factors (temperature, humidity, etc) and probably even compiler version to generally not exhibit any symptoms - is to shield the RF section from the rest of the radio electronics... ie. insulated copper/aluminium shield, or wrap some shielding around the gimbal wiring. IIRC, I have seen other users do exactly this with the X12... going as far as to basically foil a large portion of the inner shell.

robustini commented 1 week ago

Yes Peter, the intention of this report is obviously to warn and find a solution if possible, not to say "2.11 is no good" but "2.11 with this hardware combo is potentially dangerous", this is the gist of the issue. If we know the problem we can avoid it. And if anyone is able to find a solution, please come forward!

philmoz commented 1 week ago

If it is possible to isolate at which commit in 2.11 the change happens it might be possible to identify why it behaves differently.

But this will take e lot of compiling different versions and thorough testing.

robustini commented 1 week ago

If it is possible to isolate at which commit in 2.11 the change happens it might be possible to identify why it behaves differently.

But this will take e lot of compiling different versions and thorough testing.

That's what I was trying to do, at the moment I went backwards to August 1 and the problem is still there. Since 2.10 was released at the end of August it's likely that that commit maybe with 2.11 is still being ignored. I continue the hunt, I'm pretty sure I can find it, the problem occurs immediately.

pfeerick commented 1 week ago

Er... 2.10.0 was released May 12, but would have forked off of main some time before that. If you are building the code yourself, you should be to narrow down to the commit it changes relatively quickly using git bisect (while also making sure to run git submodule update --init --recursive each step to keep the submodules properly in sync).

While it's not the sort of thing you normally want, having so obvious an ... issue ... is sometimes helpful!

neuhausj commented 1 week ago

2.10.x has been forked the 28. March 2024 image

robustini commented 1 week ago

2.10.x has been forked the 28. March 2024

Then the hunt gets much more complicated; 2.10 has 474 commits behind main.

ajjjjjjjj commented 1 week ago

2.10.x has been forked the 28. March 2024

Then the hunt gets much more complicated; 2.10 has 474 commits behind main.

Using bisect it should take about 9 tests.

pfeerick commented 1 week ago

2.10.x has been forked the 28. March 2024

Make that 28 Feb ;) ... 315104c2ba9ee61edf183bed283d45f9a6c856ae should be the first unique commit for 2.10, and 57c7aca2f686a340b6ea6af7f5fcedfbae4425b3 the first common commit for 2.10 and 2.11

i.e. using the latest commit to main as the bad reference, and assuming (but would obviously need to be tested also that the common commit to both 2.10 and 2.11 is still good)

git bisect start
git bisect bad dd482f4246f09f140e54325f81d66f6edf49038f
git bisect good 57c7aca2f686a340b6ea6af7f5fcedfbae4425b3
Bisecting: 236 revisions left to test after this (roughly 8 steps)

git bisect in use => https://dev.to/alvesjessica/using-git-bisect-to-find-the-faulty-commit-25gf

robustini commented 1 week ago

Understood the logic of operation, convenient though....

git bisect good 57c7aca2f686a340b6ea6af7f5fcedfbae4425b3 Bisecting: 236 revisions left to test after this (roughly 8 steps) warning: unable to rmdir 'radio/src/thirdparty/lvgl': Directory not empty warning: unable to rmdir 'radio/src/thirdparty/stb': Directory not empty [b644aadbaa8067640b6fddc0f37e132b0e566c68] fix(cpn): binary import of multipos switches (#5169)

So i think git submodule update --init --recursive is obviously needed at each step.

robustini commented 1 week ago

I will try to tell you about it, I hope you believe me but I have repeated the test countless times on each firmware, setting the module to 500 mW. This is the list of firmware tested with the related SHA in the name:

1_b644aadbaa8067640b6fddc0f37e132b0e566c68.bin 2_4a0308a46fe02edb12df3272e64160dbc27fb7c5.bin 3_982660e8c8b15b48d5cd8fbbca88497e88bf7600.bin 4_6e706c4b1cc9d7568181bda22e08bdd52ba0f773.bin 5_471f87fc2a4eec9b127d9ff125645d420c33e785.bin 6_a4d93c693160076583fd80cf1126ba7124a54259.bin 7_fcc75bd6b74293f5cdb1802dbed94e335e2b0344.bin 8_49734e433693a9d7853fa875bb602c2bdae20851.bin 9_34856136c9eb54ca7fcb0b9d547c485bb9972841.bin

Do you want to have a laugh? The problem does that with everyone. Then I try again the firmware compiled from branch 2.10 and 100% the problem is not there. So now? Is this something that is so old that therefore it will not be considered in future releases? Possible. However it is boring because I will not be able to constantly test the main with this tx.

robustini commented 1 week ago

I could continue the hunt by marking the first one on the list as bad and try to go way back, to 2.9, let's go.

gagarinlg commented 1 week ago

Did you try compiling 2.10 before or did you use the builds provided by us? Maybe it is an issue with your compiler version

robustini commented 1 week ago

Did you try compiling 2.10 before or did you use the builds provided by us? Maybe it is an issue with your compiler version

Compilation problems? I don't think so, I always compile 2.10.5 myself and it works perfectly. I compile these radio firmwares I believe from early versions of OpenTX for Taranis X9D. Then of course we can always be wrong, but I rule it out.

3djc commented 1 week ago

I don't think 'problem' is the right word. As it is all about timing, different compilers can produce various levels of optimisation and therefore affect timings. This is also why it so difficult to try to fight this types issues at code level

robustini commented 1 week ago

I don't think 'problem' is the right word. As it is all about timing, different compilers can produce various levels of optimisation and therefore affect timings. This is also why it so difficult to try to fight this types issues at code level

The test on the other X10S EXpress was done by another EdgeTX developer, I don't think we are all compiling wrong. I agree about “problem,” not perhaps the correct word, more simply anomaly caused by possible multiple factors. I have stopped the hunt, awaiting more feedback if there is any when 2.11 is released.

gagarinlg commented 1 week ago

I never said you were compiling wrong, but we alread had issues with different compiler versions, like vastly different sizes of the binary and other stuff, so it is good to be sure that you did all tests using the same compiler version.