betaflight / betaflight-tx-lua-scripts

Collection of scripts to configure Betaflight from your TX (currently only supported in OpenTx)
GNU General Public License v3.0
590 stars 142 forks source link

BF 3.4 + 3.5: FrSky receivers (R-XSR, XSR, R9M, ...) (non-inverted mod) on FPORT, lua v1.0.2 - has problems with saving values #151

Closed RipperDrone closed 6 years ago

RipperDrone commented 6 years ago

All kinds of glitches:

Taranis FW is 2.2.2 RC5, lua scripts running somewhat more stable on BF3.4 than on 3.5... :-/

Issue has been fixed for R-XSR by FrSky firmware update plus commit / PR to BF code, but ist's not yet fixed for other receivers like XSR, R9M...

mikeller commented 6 years ago

Update: There is a difference: Disabling byte-stuffing will not fix 126 on XSR.

And I am talking to FrSky, they will be looking into it. According to them, the intended design is for the data sent by the flight controller to the RX to be byte-stuffed, it looks like there is a bug in the RX code.

mikeller commented 6 years ago

@klutvott123: You seem to be set up to do in / out comparisons. Can I get you to provide some in / out samples for byte-stuffed versions of 125 and 126 (i.e. 125 93 and 125 94)? According to FrSky, the specification has to be interpreted to do the byte-stuffing in the same way as it is done in SmartPort, i.e. the flight controller has to do the bytes-stuffing.

klutvott123 commented 6 years ago

@mikeller Ok, so i changed the smartPortSendByte function back so that it bytestuffs as usual. The thing is that when data from the FC has 125 or 126 in it, it doesn't show up at the opentx end at all so there's nothing to compare to. It turns out that enabling telemetry logging when compiling opentx makes it log everything. Couldn't find the frames containing 125 or 126 there either.

Here's an example from the log:

2018-08-26,12:27:29.870: 7E 00 32 16 0F 2E 2D 19 32 02   <-- First frame after sending MSP_PID
2018-08-26,12:27:29.880: 7E 00 32 08 35 32 4B 28 00 EA   <-- Third frame
2018-08-26,12:27:29.900: 7E 00 32 09 00 4D 00 00 00 77   <-- Fourth frame

Second frame contains 125 or 126 and is bytestuffed in the FC. This pattern repeats itself through the log.

Another example from the log:

2018-08-26,12:27:32.150: 7E 00 32 02 00 4D 00 00 00 7D 5E

This frame contains the CRC for the MSP frame itself and its 0x4D. The CRC for the f.port frame at the end is 0x7E and has been bytestuffed, BUT it has to have been bytestuffed by the RX since the CRC would be different for the frame coming from the FC since instead of 0x00 at the beginning it would have been 0x08 and 0x81 and the CRC would have been calculated from that. So the RX takes the data from the FC, extracts it and repacks it.

One last example:

2018-08-26,12:26:48.060: 7E 00 32 01 00 4F 00 00 00 7D 5D

This also contains CRC for MSP frame. The RX has calculated a new f.port CRC of 125 and bytestuffed it.

Even if the frames from the FC containing 126 were to be passed on to opentx they would have been double stuffed by the RX. At least for the R-XSR since your testing shows that it doesn't do it on the XSR.

Also thank you for contacting frsky about this!

mikeller commented 6 years ago

Thanks for the testing @klutvott123. From the look of it, the RX discards all frames that contain an 125 - this is definitely a serious problem. The byte-stuffing of the 126 in itself is not a problem, since 126 should never be received on the RX side. Also, this can be explained with the fact that the byte-stuffing is only used on the serial links on both sides, but not on the radio link (i.e. an 126 is always sent as an 126 over the radio).

klutvott123 commented 6 years ago

@mikeller Yeah you're right about the 126 not being a problem. Shouldn't be received when bytestuffing is done in the FC. I was just thinking a little to fast there. I'm just wondering if it's because the frames contain 125 or if the frame becomes longer than the RX want's it to be. I hope frsky can figure this out. Should be easy to reproduce the problem at least.

mikeller commented 6 years ago

@klutvott123, @RipperDrone, @bhuism: I got a beta version of the r-XSR firmware with a fix for this bug from FrSky for testing:

R-XSR_FPORT_LBT180827-Beta.zip R-XSR_FPORT_FCC180827-Beta.zip

I did a quick test with the FCC version, and it seems to fix this issue. Please test and provide feedback.

bhuism commented 6 years ago

I will test later today, thanks @mikeller

arg, I'm on vacation at the moment, and didn't bring a cable, I can't test it unfortunately, if nobody can test, I can desolder, flash and solder the rx tonight.

Did the desolder-flash-solder dance, I will test it tomorrow, I'm on a katute f7, tekko32 4in1, r-xsr, bf 3.5.0, fport, taranis X9D with OpenTX 2.2.2

klutvott123 commented 6 years ago

@mikeller That was fast! I have just flashed the LBT version and it seems to be working like it should now. Will test some more later. Big thank you to frsky for doing something about this!

RipperDrone commented 6 years ago

@mikeller Did a thorough test on Ummagawd / Omnibus F4SD / R-XSR / BF 4.0 #1082 / FPORT / Taranis X9D+ OTX2.2

FrSky FPORT firmware 180615_Beta:

FrSky FPORT firmware 180827_Beta FCC (which you provided from FrSky):

Still not resolved (but minor glitch): Whenever I write back altered values over Lua Script, the Taranis screen says 'Saving', then 'Retrying'. Values are still stored correctly.

Summary: BIG WOW! We finally got this nasty bug resolved - R-XSR FPORT is back in the game! Loving it, great team effort and super smart finding by @klutvott123 👍

Billj747 commented 6 years ago

@RipperDrone - FrSky FPORT firmware 180827_Beta solves the LUA script / "Sensor Lost" problem?

Where can I find that software version? All I see under the frsky 'beta' firmware page is 180615.

mikeller commented 6 years ago

@Billj747: Here: https://github.com/betaflight/betaflight-tx-lua-scripts/issues/151#issuecomment-416183224

mikeller commented 6 years ago

@RipperDrone: I suspect that the timing used in the lua script is a bit too tight - we'll probably have to extend the time until we show 'retrying'...

RipperDrone commented 6 years ago

Nice, this is cream on top 👍 Happy to test when needed. Thank you so much for interfacing with FrSky, @mikeller! 🙂

bhuism commented 6 years ago

@mikeller, @klutvott123, Just flown 4 packs with xsr-r on FPORT firmware LBT180827_Beta, BF 3.5.0 on kakute f7, OpenTX 2.2.2 on an X9D, had 0 'sensor lost' warnings! very happy with this fix and collaboration, thanks all!

mikeller commented 6 years ago

@bhuism: Glad to hear that this seems to be fixing these sporadic and hard to pinpoint 'sensor lost' warnings as well - my suspicion is that they occurred whenever a sensor value happened to contain a 125 or 126.

klutvott123 commented 6 years ago

Extending the save timeout makes no difference. I extended it to 10 seconds and it just "saves" for 10 seconds. Then "retrying" every time. I think the data is lost somewhere after being sent. When loading pages there are often frames lost. Sometimes one frame and sometimes all frames(this might also be that the request itself is lost).

mikeller commented 6 years ago

Ok, then we probably need to add multiple retries before we show 'retrying', or just accept that it won't work reliably first time round in most cases - the way telemetry data is framed for transport makes the link very unreliable for our purposes.

klutvott123 commented 6 years ago

@mikeller Yeah, i understand. I think it's best to just leave it as it is then, and just let it show the "retrying" message when it is in fact retrying. The values are still stored everytime so it's not a big issue really. Just had my first tuning session with the new firmware. No problems so far 😸

RipperDrone commented 6 years ago

It's so annoying, plus confusing for beginners. I'd go with 3x retrying w/o throwing the warning, only once it has to retry 4+ times, throw the message.

Alternatively, it could be a certain timeout. Guess implementing a counter is easier...

klutvott123 commented 6 years ago

I just made a discovery. Most of the time the values will actually be saved on the first try, but it goes through all the retries anyway for some reason. I tried increasing the number of retries and it just goes through all of them it seems.

If anyone has the time, try this. Set this line: https://github.com/betaflight/betaflight-tx-lua-scripts/blob/523fa04df1075639cd0fb4c9531171475fddae20/src/SCRIPTS/BF/protocols.lua#L12 to 0. Edit: don't set the line to 0. Set the value to 0.

This allows it to save only 1 time. Will investigate more later.

RipperDrone commented 6 years ago

@mikeller , @klutvott123 Newsflash: Tried FW FCC180827 Beta on a Mini F4 (EMAX Magnum F4 v2), wired over B10/TX3 pin of MCU. Tried both BF3.5 and 4.0 Nightly #1091:

To my surprise, all values are shown correctly at first - but only for like 10 seconds, then the first 'sensor lost' warnings start to appear, until 3-4 sensors are lost. It looks like updates come less and less frequently, including VFAS, A4, Hdg, Accxyz... it sometimes comes back, sometimes stays bad. Even getting a telemetry lost message after waiting some more :-(

Reflashed the R-XSR with 180615 Beta, and all 'sensor lost' messages are gone (just the motor/PWM page doesn't show any values on Taranis Lua anymore, as usual...)

Seems that some timings on the F4 are different from other FCs, so the solution of 180827 Beta is not agnostic to FC obviously :-/

klutvott123 commented 6 years ago

@RipperDrone that sounds weird. Did you try rebooting both transmitter and FC before flashing to 180615? I had something similar happen once where i first got "sensor lost", then "telemetry lost". RC controls didn't work either. Rebooted transmitter and FC and everything was fine.

RipperDrone commented 6 years ago

@klutvott123 Yes, did that multiple times. Can be reproduced in 100% of the cases. Also tried to switch on FC and RC in different order, no change. Changed distance RC - FC as well.

FPORT radio connection always stays intact, it's just telemetry/lua sensor information dying...

klutvott123 commented 6 years ago

@RipperDrone Sound really strange as i would assume all betaflight targets would do the communication in exactly the same way since they're all using the same code. F.port is setup exactly the same as on your other F4 boards right? Tried any other tx pins? Sure the 180827 firmware haven't been corrupted somehow? Re-download it? I really don't know why that is happening, but i would try everything i can think of. Double check and triple check etc.

@SomeguyinPDX not really the right place, but at the same time the right place since this thread has a f.port firmware that fixes a problem that you will encounter sooner or later. I would flash the R-XSR with the 180615 beta firmware first, then get it working, and then flash the 180827 firmware that you can find further up in this thread. For setting up f.port i would start with this: https://oscarliang.com/setup-frsky-fport/. Play around with the serialrx_inverted, serialrx_halfduplex.

Are you saving the values in the script after changing them?

RipperDrone commented 6 years ago

@klutvott123

function run_ui(event) local now = getTime() -- if lastRunTS old than 500ms if lastRunTS + 50 < now then invalidatePages() end lastRunTS = now if (currentState == pageStatus.saving) then if (saveTS + saveTimeout < now) then if saveRetries < saveMaxRetries then saveSettings() else -- max retries reached currentState = pageStatus.display invalidatePages() end end end

if currentState == pageStatus.saving then saveRetries = saveRetries + 1

saveTS + saveTimeout < now

Can't fully understand the timer details on my end :-/

RipperDrone commented 6 years ago
klutvott123 commented 6 years ago

saveTS is the time where it started saving. If saveTS + saveTimeout is less than the current time it means that at least saveTimeout has passed since it started saving. Look at it this way: if now(current time) is larger than saveTS(time where saving started) + saveTimeout, then AT LEAST saveTimeout must have passed.

Yes, 150 corresponds to 1.5s, but the timeout used for s.port/f.port is 300(3 seconds).

The way i see it right now is that the whole "saving" and "retrying" is mostly for show. I don't see how it can know if the values have been saved without getting some kind of response from the FC. I'll look further into this tomorrow.

Just for fun/testing, try this: After this line: https://github.com/betaflight/betaflight-tx-lua-scripts/blob/523fa04df1075639cd0fb4c9531171475fddae20/src/SCRIPTS/BF/ui.lua#L283 Paste this:

if currentState == pageStatus.saving and mspProcessTxQ() then
    currentState = pageStatus.display
    invalidatePages()
end

Try saving now 😄

RipperDrone commented 6 years ago

@klutvott123 @mikeller

Tried it - saves fine, no more messages. Even no 'Saving...' message displayed (maybe just a fraction of a second? Cannot catch it, though). Then empty fields are shown, followed by reading the entries back from the FC which takes 2-3secs. Finally, the updated values are displayed.

So it's light and shadow: Retrying... message is suppressed, but GUI is very un-intuitive to the user now: Not hinting at what's going on while empty fields are being shown...

It looks like the old implementation was showing the Saving... message for quite a long period of time, maybe covering the read-back period as well?

RipperDrone commented 6 years ago

@klutvott123 - as far as your statement 'just for show message displays, cannot know when saving was successful': Can't the system read back the altered values from the FC and compare against the RC/Lua inputs, thus confirming they have successfully been stored, page by page?

RipperDrone commented 6 years ago

if saveRetries <= 1 then lcd.drawText(SaveBox.x+SaveBox.x_offset,SaveBox.y+SaveBox.h_offset,"Saving...",DBLSIZE + BLINK + (globalTextOptions)) else lcd.drawText(SaveBox.x+SaveBox.x_offset,SaveBox.y+SaveBox.h_offset,"Retrying",DBLSIZE + (globalTextOptions)) end

RipperDrone commented 6 years ago

local function saveSettings(new) if Page.values then if Page.preSave then payload = Page.preSave(Page) else payload = {} for i=1,(Page.outputBytes or #Page.values) do payload[i] = Page.values[i] end end protocol.mspWrite(Page.write, payload) saveTS = getTime() if currentState == pageStatus.saving then saveRetries = saveRetries + 1 else currentState = pageStatus.saving saveRetries = 0 saveMaxRetries = protocol.saveMaxRetries or 2 -- default 2 saveTimeout = protocol.saveTimeout or 150 -- default 1.5s end end end

klutvott123 commented 6 years ago

@RipperDrone The code i posted was WRONG. It was supposed to be:

if currentState == pageStatus.saving and  not mspProcessTxQ() then
    currentState = pageStatus.display
    invalidatePages()
end

Can you try this?

mspProcessTxQ() returns "false" when all packets in the transmit queue has been sent. This means that it will only show the saving message as long as there's still data to be sent. If it takes to long, it will retry as usual.

RipperDrone commented 6 years ago

@klutvott123 Nice - Shows no more 'Retrying' now! Tried to change vlues on each page, message 'Saving...' comes up for ~1-2secs, then showing blank fields, then showing read back values. Reading back someimes takes a while, maximum around 3-4 seconds, but that's fine. We get rid of the 'Retrying' message finally :-)

klutvott123 commented 6 years ago

@RipperDrone I think after sending the values to be saved, the script then waits for a response here: https://github.com/betaflight/betaflight-tx-lua-scripts/blob/523fa04df1075639cd0fb4c9531171475fddae20/src/SCRIPTS/BF/ui.lua#L363

If it receives the response it wants before the save timeout has elapsed, it invalidates the pages and they will reload with the new values. If no response is received before it timeouts, it will retry. It works just fine for s.port but not for f.port. I suspect that maybe no response is sent when using f.port, causing it to go through all the retries. If no response is sent, it has to be something on the betaflight side of things. The code i posted above just invalidates the pages when all the data is sent and doesn't wait for a response.

@mikeller any thoughts on this?

Also i think the requestTimeout of 800ms is a little too short for f.port. Sometimes it timeouts just when the data is about to arrive. Try setting it to 100(1 second). The downside is that if it does timeout, you have to wait even longer.

klutvott123 commented 6 years ago

I have soldered my uart adapter back on to my f.port pad. When pressing save i can see all the frames that were sent from the lua script arriving at the FC. Right after the last frame is received, the FC sends back the MSP_SET_PID command like this:

126 8 1 48 33 0 0 139 0 0 26 126 <---last pid frame containing CRC
8 129 50 29 0 202 0 0 0 92 <---write command sent back from FC

This makes sense since the lua script expects the write command to be sent back here: https://github.com/betaflight/betaflight-tx-lua-scripts/blob/523fa04df1075639cd0fb4c9531171475fddae20/src/SCRIPTS/BF/ui.lua#L95

When the write command has been received back from the FC, the script sends the eeprom write command back to the FC. This command arrives at the f.port pad like this:

126 8 1 48 50 0 250 250 0 0 158 126

The script then expects to get that command sent back from the FC here: https://github.com/betaflight/betaflight-tx-lua-scripts/blob/523fa04df1075639cd0fb4c9531171475fddae20/src/SCRIPTS/BF/ui.lua#L104 The thing is that the eeprom write command that the FC is supposed to send back is nowhere to be seen. It is never sent back. I can confirm this by invalidating the pages after receiving the write command back. Then it works just like with s.port. If anyone wants to try it, do this: After this line https://github.com/betaflight/betaflight-tx-lua-scripts/blob/523fa04df1075639cd0fb4c9531171475fddae20/src/SCRIPTS/BF/ui.lua#L101 Add this:

invalidatePages()

This is just for testing, but i think we're closing in on a proper fix for this.

RipperDrone commented 6 years ago

Whoa, this sounds a bit too complicated to me - so you imply that the sequence of triggering an eeprom write command and acknowledging it back from FC to RC is coded differently in BF firmware for FPORT than it is coded for SBUS? In this case, the bug can be found in BF sbus.c vs. fport.c?

klutvott123 commented 6 years ago

It's not really that complicated. First you send all the frames containing the values you want to save. The FC responds with the write command for that msp frame to let you know that it received all the data correctly. When you get that response, you send the eeprom write command, telling the FC to save the values. The FC sends the eeprom write command back again so that you know that it actually saved the values. You then send a reboot command to the FC if the current page calls for that(PWM page needs that).

The eeprom write command isn't sent back from the FC, and that's where this whole thing fails.

I suspect fport.c might be the place, but i think @mikeller might have a pretty good idea about where to look 😄

mikeller commented 6 years ago

lol, this is ironic:

https://github.com/betaflight/betaflight/blob/adc965327f288d23a840077aab3e18d9ba3296f5/src/main/fc/config.c#L533-L537

mikeller commented 6 years ago

@klutvott123: Do you want to test betaflight/betaflight#6688?

klutvott123 commented 6 years ago

@mikeller just tested it and it works. Thank you! After the script sends the eeprom write command, this is sent from the FC:

8 129 50 19 0 250 0 0 0 54

It took about 120ms for it to be sent back from the FC. But i think this can be adjusted for by increasing the timeout delay like you suggested earlier. I still find it strange that s.port has been working the whole time. Wouldn't frames be skipped for s.port too?

mikeller commented 6 years ago

@klutvott123: Thanks for testing, glad to hear we were able to stick a fork into this! The difference between SmartPort and FPort is that SmartPort is a telemetry protocol, and as such is not suspended during EEPROM operations. FPort on the other hand is an RX protocol (with integrated telemetry), and is suspended.

klutvott123 commented 6 years ago

@mikeller Thanks for explaining. That makes sense. I didn't think of that. I increased the save timeout to 5 seconds, and now it saves on the first try most of the time. The shorter timeout delay causes the PWM page to sometimes go through all the retries. The values are saved but it doesn't do the reboot like it's supposed to. Looks like 5 seconds is just enough to make it work. I'll do some experimenting with timeouts/retries. It does make sense that f.port has to retry a bit more than s.port since we know that frames sometimes go missing even if they are sent from the FC.

mikeller commented 6 years ago

@klutvott123: I have put some more thought into this, and the only reason I can see for this to work better with SmartPort than with FPort is the lower data rate of SmartPort, which probably prevents UART buffer overruns while the CPU is blocked with the EEPROM write.

But the whole suspension is mumbo jumbo, since we do not use preemptive multitasking, and as a consequence there is no way that the code checking the suspension during it will ever be run, only the code checking for the frames to skip after the EEPROM write / read function has returned.

I have entirely removed the suspension for serial RX, which might further improve the FPort saving situation, please test.

I'll open another pull request for the suspension to be removed for PWM / PPM as well, and only leave the skips after the fact. But this should go into 4.0, since it needs some testing.

RipperDrone commented 6 years ago

@mikeller @klutvott123 Thank you so much, guys. It's sooo freakin' satisfying we successfully found the (2!) code lines responsible for that bug :-) It was worth the effort, thank you @mikeller for interfacing with FrSky so effectively and fixing the code, and thank you @klutvott123 for sticking into it so much - great pleasure!

@mikeller - should I close this issue item now?

bhuism commented 6 years ago

Same here, thanks @mikkeller @klutvott123 @RipperDrone, and others, great to see this particular one fixed.

@mikeller, fyi, the PR build failed with an unused method calculateChannelMovingAverage(...)

klutvott123 commented 6 years ago

@mikeller I just tested your last changes, and it looks like there's no noticeable improvement on the saving compared to the previous changes. Might as well just remove it anyway if it's not doing anything useful like you say.

I have done more testing with the changes you made. I press save and log the data coming to and from the f.port pad. The FC sends back the responses everytime, and even then the script might retry because it didn't receive the response. I even increased the timeout delay to 20 seconds, and still it retries because of no response even though my log shows it was sent by the FC. I can't see any more problems with how betaflight and the lua script handles this now. But i suspect something more might be going on in the f.port firmware. It IS still beta.

@bhuism @RipperDrone 😁 also thanks to @atomiix for providing the clues pointing us in the right direction!

mikeller commented 6 years ago

@klutvott123: Thanks for testing. I suspect there will always be some disruption in this process, since the flight controller essentially locks and does not sent timely telemetry frames back for the time it takes to write or subsequently read the EEPROM. Changing this would be more effort than it's worth. The problem is probably less prominent on SmartPort, where the lockout duration might be less than the interval between two frames. I will leave betaflight/betaflight#6688 as it is now, and open a pull request to change the suspension into just skipping frames after the flight controller is done writing / reading the EEPROM for PPM / PWM for 4.0 when this is merged.

@RipperDrone: Yes please.

klutvott123 commented 6 years ago

@RipperDrone Please don't close yet. I have more info on this.

@mikeller I actually got the 3.5.0 build i was on to work without the changes you made.

To test your changes i had to make a 4.0.0 build to get it working because of some changes that have been made in feature.c/feature.h. It worked, but with no led strip. I see you made a pull request to remove led_strip support for F3 targets during 4.0.0 development. I disabled some features and reenabled the led_strip and the saving stopped working. Disable the led strip makes it work again.

I then implemented the changes you made in the latest 3.5.x code and tested with that. Led strip is enabled by default and it doesn't work. Goes through all retries. I disabled the led strip in the 3.5.x build and it started working again.

Then i went back to the 3.5.0 build i was using before, and disabled the led strip, and it works. This is without the changes in config.c/rx.h/rx.c.

klutvott123 commented 6 years ago

Update: I reenabled the led strip on my 3.5.0 build(still without https://github.com/betaflight/betaflight/pull/6688). I then disabled a lot of other features i don't need, and now it works with led strip enabled, so it doesn't have anything to do with the led strip feature itself. This brings down the FLASH/RAM usage dramatically. Might it have something to do with that?

klutvott123 commented 6 years ago

Ok, i think i understand this now. Just for testing i disabled writing the config to eeprom in rx.c. When "saving" from the lua script, it works as good as everytime on the first try(except that nothing is actually saved of course).

@mikeller what you said earlier about uart buffer overruns happening during eeprom writes explains this. I assume that betaflight writes the entire config to eeprom. This would explain why disabling features helps with this, since it would have to write a lot more with a lot of features enabled.

I think i'm satisfied now. 🕺