betaflight / betaflight

Open Source Flight Controller Firmware
GNU General Public License v3.0
8k stars 2.86k forks source link

Update CRSF spec #13614

Closed haslinghuis closed 1 week ago

haslinghuis commented 2 weeks ago
github-actions[bot] commented 2 weeks ago

Do you want to test this code? You can flash it directly from Betaflight Configurator:

WARNING: It may be unstable. Use only for testing!

hydra commented 2 weeks ago

Can this be scheduled for 4.5.1 please, this needs to happen sooner rather than later and given the current release schedule of BF 4.6 might be another year or so.

haslinghuis commented 2 weeks ago

Bugfixes will be backported. Need testing first.

hydra commented 1 week ago

Can I enquire who's testing this and what the status is?

IMHO this should just be merged into master to at least get the devs and anyone building software using BF as a reference to at least use the correct baud rate.

nerdCopter commented 1 week ago

what is ELRS' position? i have not kept up after the initial discussions. @JyeSmith @CapnBry

haslinghuis commented 1 week ago

Seems official baud rate - just want to have this tested scientifically :)

measuring is knowing

CapnBry commented 1 week ago

what is ELRS' position? i have not kept up after the initial discussions.

Does it make any difference? Betaflight worked with TBS products since the inception of CRSF with the "wrong" baud rate, it should still work if BF changes to weirdest-baud-rate-one-could-conceive (likely a PIC32 clock multiplier?).

I believe it is only really an issue if some target couldn't generate this baud and ended up being even lower on the wire. It is only 0.96% difference otherwise which should be tolerable? It should definitely be tested against ExpressLRS STM32, ESP32, and ESP8285 targets but I do not anticipate any issues.

@hydra why does this need to happen ASAP exactly? What uhhhh isn't working that is so broken that it needs immediate attention? It has operated like this for 8 years šŸ˜†

haslinghuis commented 1 week ago

It should definitely be tested against ExpressLRS STM32, ESP32, and ESP8285 targets but I do not anticipate any issues.

Yes please focus on the change - like to have this tested if there are any improvements.

CapnBry commented 1 week ago

like to have this tested if there are any improvements

Is there an anticipated improvement? I thought this was more academic than beneficial.

haslinghuis commented 1 week ago

IDK - just following official spec here. Purpose of this PR is to enable testing and get the facts.

Appreciate testing results.

JyeSmith commented 1 week ago

what is ELRS' position? i have not kept up after the initial discussions. @JyeSmith @CapnBry

First ELRS is hearing about a change when this PR opened. So there has been no discussion or coordination in a controlled way.

I agree there is no rush here, and maybe the PR should be marked as a draft.

Steve requested the spec https://github.com/betaflight/betaflight/issues/12398#issuecomment-1440700134, where is this?

TBS expressed that they will provide a spec, so we can wait for this https://github.com/tbs-fpv/freedomtx/issues/26#issuecomment-1422702908. Otherwise community projects are just making random changes.

The crsf-wg was also started to try and do these changes in a controlled manner https://github.com/crsf-wg/crsf/issues. Something BF said they are up for https://github.com/tbs-fpv/freedomtx/issues/26#issuecomment-1423227746.

Im not opposed to a change, lets just get some documentation around it so that everyone knows the standard. SmartAudio is great like this, while TRAMP is a not. Lets start doing it right.

haslinghuis commented 1 week ago

Really love to see any test results opposed to master before further consideration updating standards.

CapnBry commented 1 week ago

Betaflight Testing Setup

--- a/src/main/rx/crsf.c
+++ b/src/main/rx/crsf.c
@@ -73,6 +73,7 @@ static timeUs_t crsfFrameStartAtUs = 0;
 static uint8_t telemetryBuf[CRSF_FRAME_SIZE_MAX];
 static uint8_t telemetryBufLen = 0;
 static float channelScale = CRSF_RC_CHANNEL_SCALE_LEGACY;
+static int32_t crsfCrcErrCnt = 0;

 #ifdef USE_RX_LINK_UPLINK_POWER
 #define CRSF_UPLINK_POWER_LEVEL_MW_ITEMS_COUNT 9
@@ -460,6 +461,7 @@ STATIC_UNIT_TESTED void crsfDataReceive(uint16_t c, void *data)
                     break;
                 }
             } else {
+               DEBUG_SET(DEBUG_CRSF_LINK_STATISTICS_UPLINK, 4, ++crsfCrcErrCnt);
 #if defined(USE_CRSF_V3)
                 if (crsfFrameErrorCnt < CRSF_FRAME_ERROR_COUNT_THRESHOLD)
                     crsfFrameErrorCnt++;
diff --git a/src/main/rx/crsf_protocol.h b/src/main/rx/crsf_protocol.h
index 0b56233f5..59382c844 100644
--- a/src/main/rx/crsf_protocol.h
+++ b/src/main/rx/crsf_protocol.h
@@ -27,7 +27,7 @@
 #include <stdint.h>
 #include <stdbool.h>

-#define CRSF_BAUDRATE           420000
+#define CRSF_BAUDRATE           416666

 enum { CRSF_SYNC_BYTE = 0xC8 };

DEBUG set to DEBUG_CRSF_LINK_STATISTICS_UPLINK. Value can be monitored via Betaflight Configurator's Sensors tab set to graph DEBUG, in debug slot 4 (will display "not used" but I assure you it is). 1000ms update rate is fine. Expected value: always 0.

ExpressLRS Test Setup

diff --git a/src/lib/Telemetry/telemetry.cpp b/src/lib/Telemetry/telemetry.cpp
index b055fc68..9be47f57 100644
--- a/src/lib/Telemetry/telemetry.cpp
+++ b/src/lib/Telemetry/telemetry.cpp
@@ -163,6 +163,8 @@ void Telemetry::ResetState()
     }
 }

+int8_t crsfCrcErrors = 0;
+
 bool Telemetry::RXhandleUARTin(uint8_t data)
 {
     switch(telemetry_state) {
@@ -215,6 +217,8 @@ bool Telemetry::RXhandleUARTin(uint8_t data)
                     receivedPackages++;
                     return true;
                 }
+                else if (crsfCrcErrors < 127)
+                    ++crsfCrcErrors;
                 #if defined(UNIT_TEST)
                 if (data != crc)
                 {
diff --git a/src/src/rx_main.cpp b/src/src/rx_main.cpp
index 7059df00..2cf7a6f7 100644
--- a/src/src/rx_main.cpp
+++ b/src/src/rx_main.cpp
@@ -431,6 +431,8 @@ void ICACHE_RAM_ATTR LinkStatsToOta(OTA_LinkStats_s * const ls)
         ls->SNR = SnrMean.previousMean();
     }
 #endif
+    extern int8_t crsfCrcErrors;
+    ls->SNR = crsfCrcErrors;
 }

 bool ICACHE_RAM_ATTR HandleSendTelemetryResponse()

Much hack, so dumb. Monitors CRC errors on the ExpressLRS side, is sent back as telemetry in the RSNR field on EdgeTX (31 max value reportable). Expected value: always 0.

Testing

All tests done with TX16S Internal module, TMVELOXF411 Betaflight (3.2k/3.2k + analog OSD on), iFlight 2.4G RX (ESP8285)

Partial Conclusion

STM32/ESP32 still needs to be tested but I am Limon-ing, however it appears there is a large window where the baud is fine. ExpressLRS operates in that range for either the old or new baud.

SunjunKim commented 1 week ago

bit_error_calc.xlsx

I made a small sheet that calculates the theoretical bit error possibility.

With 416666 baud rate RX side, various TX bauds are tested: The first wrong bit reading is expected to appear after

As the UART 8N1 format requires reading 10 bits, any value near 10-ish will be problematic, which will likely cause a failed reading on slight clock drift. Value <10 will be a failure, definitely.

This result aligns with @CapnBry 's experiment, and it confirms the compatibility between 420k and 416.666k is totally fine.

nerdCopter commented 1 week ago

I did not see this one linked, so here is "TBS official comment" šŸ¤·ā€ā™‚ļø : https://github.com/tbs-fpv/freedomtx/issues/26#issuecomment-1731202619

CapnBry commented 1 week ago

Updating with ExpressLRS ESP32 pico D4 results (RadioMaster ER8 receiver). Slightly better tolerance than ESP8285 results, but possibly due to only testing higher bauds (no baud <416666 was tested).

CapnBry commented 1 week ago

And wrapping up the testing, STM32 (R9MM receiver). STM32 are more sensitive to the baud rate change and have more errors when ExpressLRS runs at the original CRSF baud (420000). However, it also appears STM32 has issues even when set to the correct baud which taints the results-- STM32 receivers don't work perfectly with or without this change.

NOTE! Testing procedure is different here, as this is a 900MHz receiver and can only run at 200Hz, therefore the percentage of errors 5x higher normalized to the Team2.4 receiver than can output 1000Hz.

This one was a bit of a nightmare to test because changing the baud on STM32 can only be done by flashing it, but once you change the baud you can't flash it back any more without STLINKing which I forgot and it is all ya'll's fault šŸ˜…

Overall Conclusions: Technical

It does appear that changing the baud rate is fine for ESP8285 and ESP32 ExpressLRS receivers, but can have a small detrimental effect on ExpressLRS STM32 receivers. ExpressLRS STM32 users can not change the baud rate without reattaching an STLINK, as the bootloader used for passthrough flashing can't be swapped via itself, which creates a large burden on the user base if ExpressLRS were to change.

In short this change has a insubstantial detrimental effect to the ExpressLRS ecosystem as a whole, further mitigated by the fact that the overwhelming majority of users are on hardware that will continue to function at 100% no matter which of the two baud rates are used.

Overall Conclusions: Personal

Why are we considering this? What benefit can come from it vs the risks? The community has operated for 8 years on the assumption that 420k baud was the proper CRSF baud for receivers. TBS has never raised an objection, or even mentioned it despite numerous enhancements to the Betaflight CRSF code in that time. We've only become aware this was not the accurate baud rate via a comment from a reputable source stating it, who also has not suggested anyone change.

Is there a secret document that's under an NDA which theoretically states that the correct baud rate is 416666. The community almost entirely has operated under the assumption that the correct baud rate is 420000. Numerous pieces of software have been built hard-coding this baud rate into their applications, just as Betaflight has.

Therefore it is my opinion that the world adopts 420000 baud The Standard CRSF Baud, as that is what almost everyone is using. I would readily change my mind though if a TBS user tested the BF patch I posted above, and posted number of the number of CRC errors when running 250Hz Tracer with Betaflight set to 416666 vs 420000 baud.

ledvinap commented 1 week ago

0.8% difference in baudrate is within clock tolerance, so it PR is "only" increasing safety margin.

ctzsnooze commented 1 week ago

@CapnBry Thanks so much for the careful testing. We have four options:

SunjunKim commented 1 week ago

@CapnBry Thanks so much for the careful testing. We have four options:

  • stay at 416666 (no change from present)
  • change that value to 420000
  • keep the default of 416666 but provide a build option for 420000
  • keep the default of 420000 but provide a build option for 416666 Which do you personally think we should pursue?

My personal preference is: keep the default of 420000 but provide a build option for 416666

blckmn commented 1 week ago

Leave it at 420000 as the default and opt for 416666 as build option. I have not come across any issues with 420000.

ctzsnooze commented 1 week ago

Status Quo back to 3.5 is 420000 - I agree, don't change that, but make 416666 an option

ledvinap commented 1 week ago

Either don't change anything, or fix it. Another build option is overkill.

CapnBry commented 1 week ago
  • stay at 420000 (no change from present)

My opinion is that is that it should stay 420000. If there had been problems with the "incorrect" baud rate over the past 8 years then I feel like this would have been changed already. I can't have been the only person ever who looked to see how many packets weren't passing CRC, or that the most popular FC software was using the correct baud rate for the protocol I use, am I? šŸ¤Ø

The original 416.6 came from OpenTX dev team and our Crossfire team lead when we were looking for a higher baudrate than the standard 115.2.

Thank you for bringing your knowledge to this discussion! I'm not sure if this is being misremembered, but OpenTX / EdgeTX have never used 416666 baud and has always used 400000 baud. This for sure is incompatible, as running 416666 or 420000 baud on the TX module simply does not work at all.

To my knowledge, I am the only person who has ever created public documentation for CRSF, but it was based entirely on my own opinions from reverse engineering and reading existing open source code 5 years ago. I've recently updated it with information from a secret document I found, which also contained information contradictory to what I know to be true. It would be a great benefit if the CRSF protocol specification was publicly available right from TBS's website so everyone's not referring to the super secret confidential document they can't share. That would prevent this sort of confusion and allow us to not rely on asking Trappy to comment on what is right or wrong, and knowing those comments aren't guaranteed to be totally accurate

blckmn commented 1 week ago

Leaving it as 420k, if someone needs 416... they've now got a custom define they can set.

Thanks all for the discussion, appreciate the input and testing.

hydra commented 4 days ago

@hydra why does this need to happen ASAP exactly? What uhhhh isn't working that is so broken that it needs immediate attention? It has operated like this for 8 years

@CapnBry some reasons:

1) simply because 8 years is way to long for something so fundamentally incorrect and confusing to be left un-fixed. 2) so we can all move on to more important stuff instead of 'dithering' around fixing a known-incorrect baud rate, which seems to have come about due to unpublished CRSF specifications and incorrect reverse engineering.

Either don't change anything, or fix it. Another build option is overkill.

@ledvinap agreed!

Leaving it as 420k, if someone needs 416... they've now got a custom define they can set.

@blckmn this only goes to confuse everyone further and perpetuate the mess. Less is more. We don't need it.

respectfully, I request that maintainers please reconsider.

hydra commented 4 days ago

Also, @CapnBry and @SunjunKim thanks for your testing, your results were pretty much what I expected in that it makes no perceptible difference when updating the baud rate to the correct spec confirming my feeling that we should all just be using the right baud rate in all our software and tooling and move on.

@CapnBry with regards to the ELRS bootloader on STM32 boards, correct me if I'm wrong but the bootloader at 420000 still allows passthrough flashing 416666 and vice versa right?

CapnBry commented 4 days ago

@CapnBry with regards to the ELRS bootloader on STM32 boards, correct me if I'm wrong but the bootloader at 420000 still allows passthrough flashing 416666 and vice versa right?

Seems to be just fine on R9MM anyway (tested 3x, but not sure about the vice versa part of your question)


Uploading .pio\build\Frsky_RX_R9MM_R9MINI_via_BetaflightPassthrough\firmware.bin
  ** Searching flight controllers **
      > FC found from 'COM6'

Detected the following serial ports on this system:
  COM6

=================== FIRMWARE UPLOAD ===================

  Bin file '.pio\build\Frsky_RX_R9MM_R9MINI_via_BetaflightPassthrough\firmware.bin'

  Port COM6 @ 416666

  Using CRSF (full duplex)!

======== PASSTHROUGH INIT ========
  Trying to initialize COM6 @ 416666

Attempting to detect FC UART configuration...
    ** Serial RX config detected: 'serial 1 64 115200 57600 0 115200'
Enabling serial passthrough...
  CMD: 'serialpassthrough 1 416666'
======== PASSTHROUGH DONE ========

Attempting to reboot into bootloader...

[1] retry...

Verified RX target 'FRSKY_RX_R9MM_R9MINI'
    Bootloader version found : 'v0.5.4'

    Bootloader type found : 'R9MM'

    Got into bootloader after: 1 attempts

Wait sync...
 sync OK

uploading 51028 bytes...

3%
...
98%
Success!!!!
hydra commented 4 days ago

@CapnBry with regards to the ELRS bootloader on STM32 boards, correct me if I'm wrong but the bootloader at 420000 still allows passthrough flashing 416666 and vice versa right?

Seems to be just fine on R9MM anyway (tested 3x, but not sure about the vice versa part of your question)

I mean that if the if the BL is changed to use 416666, and the user is using a version of BF that uses 420000 then flashing an updated version of ELRS still works. I'm assuming it does due to the reasons you and @SunjunKim outlined before, but would be good to confirm. That way there's no problem, again unless I'm mistaken, for both BF and ELRS to correct the baud rate to match the spec of 416666 and users with old bootloaders at 420000 are still ok and going forward every new install will be using the right baud rate.

CapnBry commented 4 days ago

Oh, I'm not going to go through all the trouble to rebuild the bootloader and hook up the STLINK and all that to try it.

ledvinap commented 4 days ago

The decision to merge this was stupid.

hydra commented 4 days ago

Hi everyone again,

just following up on this again as it should be noted that @CapnBry 's testing may be masking an issue.

I came across the issue with the various baud rates when I was implementing CRSF in Rust. Initially the baud rate of 400,000 was used, and the MCUs (F446 and H743)'s uart peripherals were reporting framing errors, that is that UART's FE bit was set.

I then tried 416,666 after getting an updated copy of the spec from trappy, and was still getting framing errors, which was because I was testing with an ELRS receiver which was using 420,000. After correcting the ELRS RX to use the right baud rate all the framing errors were gone.

Depending on implementations of UARTs and CRSF code users may get more errors if bytes are thrown away because of framing errors, if framing errors are ignored and the RX byte is still used sometimes the packets are still OK, but this doesn't mean the errors are not there, and it also isn't free as code has to clear the FE bit in the UART.

image

SunjunKim commented 3 days ago

The problem I recognize is the fact that thereā€™s two CRSF protocols in use ā€“ community-built CRSF (mostly from the reverse engineering) and TBS official CRSF, which have been secret to everyone.

I want to ask.. how many devices are working on 420,000 and how many are on 416,666? I believe status quo is 420,000, even though the official CRSF had been 416,666. We, as a community, missed the chance to correct it in the beginning v3.5.7, and TBS hasnā€™t claimed their spec upon the wrong implementation, for full eight years.

Now we need to deal with a path dependency. As the most of the devices in the field is working on community-standard 420,000, whatā€™s the benefit of change it to the TBS-spec 416,666 to be default to everyone, only for the TBS gears? I feel like itā€™s a tail wag the dog situation. The current PR - 420 default, but giving 416 option for who needs that makes more sense to me.

tbs-trappy commented 3 days ago

The problem I recognize is the fact that thereā€™s two CRSF protocols in use ā€“ community-built CRSF (mostly from the reverse engineering) and TBS official CRSF, which have been secret to everyone.

There is only one CRSF protocol.

I want to ask.. how many devices are working on 420,000 and how many are on 416,666?

In terms of device numbers it's hard to say, but 95% of all projects use 416.6k. The only ones that don't that I know of are betaflight and ELRS.

[...] and TBS hasnā€™t claimed their spec upon the wrong implementation, for full eight years.

False. The concern has been raised multiple times. Anyone who has ever asked us has been told it's wrong. This isn't even the first thread by hydra on the issue. It's just that whenever we said it was wrong, the consensus seems to have been that "it works fine" and to leave it as is. Which is fine with me - just don't claim it's our fault for not raising the issue :)

As the most of the devices in the field is working on community-standard 420,000, whatā€™s the benefit of change it to the TBS-spec 416,666 to be default to everyone, only for the TBS gears?

It's still a TBS protocol, so changing it to the TBS spec would be implementation of the spec. Leaving it at as it is would be creating a different spec. Also, "most devices on the field" is just a very biased and factually incorrect interpretation of the word "most".

Fwiw, I'm not here to advocate for either of the changes, or even influence the outcome. It's your project(s), they are your decisions, and since TBS products work fine with the implementations either way, either way "works" for TBS.

SunjunKim commented 3 days ago

The problem I recognize is the fact that thereā€™s two CRSF protocols in use ā€“ community-built CRSF (mostly from the reverse engineering) and TBS official CRSF, which have been secret to everyone.

There is only one CRSF protocol.

I want to ask.. how many devices are working on 420,000 and how many are on 416,666?

In terms of device numbers it's hard to say, but 95% of all projects use 416.6k. The only ones that don't that I know of are betaflight and ELRS.

[...] and TBS hasnā€™t claimed their spec upon the wrong implementation, for full eight years.

False. The concern has been raised multiple times. Anyone who has ever asked us has been told it's wrong. This isn't even the first thread by hydra on the issue. It's just that whenever we said it was wrong, the consensus seems to have been that "it works fine" and to leave it as is. Which is fine with me - just don't claim it's our fault for not raising the issue :)

As the most of the devices in the field is working on community-standard 420,000, whatā€™s the benefit of change it to the TBS-spec 416,666 to be default to everyone, only for the TBS gears?

It's still a TBS protocol, so changing it to the TBS spec would be implementation of the spec. Leaving it at as it is would be creating a different spec. Also, "most devices on the field" is just a very biased and factually incorrect interpretation of the word "most".

Fwiw, I'm not here to advocate for either of the changes, or even influence the outcome. It's your project(s), they are your decisions, and since TBS products work fine with the implementations either way, either way "works" for TBS.

Hi, Trappy, happy to see you here. I believe hydra mentioned that Iā€™d like to get the protocol document wrt the crsf-wg up to date with the official CRSF spec. If youā€™re comfortable with that, please let me know. Thank you in advance!