bitdump / BLHeli

BLHeli for brushless ESC firmware
GNU General Public License v3.0
1.96k stars 1.09k forks source link

Inconsistent delay for DSHOT telemetry responses on Tekko F3. #468

Closed hydra closed 4 years ago

hydra commented 4 years ago

While investigating some DSHOT errors on the SPRacingH7EXTREME, as noted here (https://github.com/betaflight/betaflight/issues/9886#issuecomment-655085419) I found this.

In this first image you can see consecutive DSHOT commands (the first part of the signal, from the FC) followed by telemetry responses (the second part of the signal, from the ESC). Note that most of them have 30us gap between them.

NewFile2

Here's the detail of a 'good' response. NewFile6

Here's the detail of a 'bad' response, i.e. one that is set to early. NewFile7

The FC has to transition from OUTPUT to INPUT, if the ESC sends the response too early then the FC will miss some of the signal from the ESC and thus you'll get errors.

I guess the BL Heli devs need to investigate this so that all the telemetry responses occur after the same delay of about 30uS (for a DSHOT 300 signal). Check with other BF/DSHOT/ESC devs for the exact timing requirements required for each MCU (F1-H7) to transition the first GPIO pin used for DSHOT from OUTPUT to INPUT and make sure the responses are within-spec. Meanwhile FC DSHOT devs should look to reduce the time it takes for the transition to occur.

hydra commented 4 years ago

Probably related: https://github.com/bitdump/BLHeli/issues/464

sskaug commented 4 years ago

Thank you for the input. We will look into why there is sometimes the short delay between the input Dshot and the output GCR from the ESC.

sskaug commented 4 years ago

I've now tried to reproduce this here, without success on the few units I have here. With multiple powerups, signal detections etc.

Is this elevated error rate (~0.5%) only on one channel on one of your ESCs? Is it consistent between powerups etc?

hydra commented 4 years ago

Yes, consistent between power ups. Yes, it does it on all the channels, I hadn't noticed this during initial investigation as I was focused on the channel with the most errors. Here's scope traces from all 4 channels on the ESC.

CH1 NewFile1 CH3 NewFile2 CH4 NewFile2 CH2 NewFile3

It's easy for me to capture the traces above, because it does it so frequently. I will try and see how long there is between multiple short-delay responses, but that is harder, maybe there's some timer overflow or something?

I have not changed the ESC firmware, it's as-supplied by the factory.

image

hydra commented 4 years ago

On one capture, the first DSHOT command, that had a short telemetry response delay the cursor positions on the scope are: A: -20.79ms B: +16.95ms = 37.74ms between DSHOT commands that had short telemetry response delays.

Here's the ESC setup, if it helps:

image

Here are the trace and waveform files from my Rigol DS1074Z scope. Which you can open and view with rigol software.

dshot errors 20200711-1604.zip

ChrisRosser commented 4 years ago

@hydra

I've been having some strange desync issues with my tekko32 F3 and found this post very interesting. I found that behaviour improved by setting timing to auto, PWM to a lower setting (96kHz to 48kHz in my case) and demag to high. I would be interested to see if that might also help you and if so it may provide additional insight as to the root cause.

hydra commented 4 years ago

@ChrisRosser it does it even when the motors are idle so I doubt that would change anything as the issue is with the I/O side of things, and not the motor control side of things, but @sskaug would know better than I.

@sskaug if you have any other tests you'd like me to perform let me know.

Nico9n commented 4 years ago

https://youtu.be/M95RbrCyN9E I can reproduce this here, it takes some time to find the error. @sskaug @hydra

hydra commented 4 years ago

@Nico9n thanks for the report, clearly the problem is widespread.

Keep us posted with your findings @sskaug, I think there are many affected users!

sskaug commented 4 years ago

Thanks for all the input and help on this. I believe I have now found the culprit, in that the F3 MCU sometimes sets the capture compare flag when wrapping during the reprogramming of the timer. I have posted a test code with a fix (clear the CCR1 flag after reprogramming timer) for the TEKKO32_F3_4in1 here: https://github.com/bitdump/BLHeli/tree/master/BLHeli_32%20ARM/Bidir%20Dshot%20F3%20testcode

ChrisRosser commented 4 years ago

@sskaug Is this OK to flash for the TEKKO32 F3 4in1 B or should I wait?

sskaug commented 4 years ago

Never flash the wrong code. Test code for the _B also posted.

ChrisRosser commented 4 years ago

@sskaug Thank you for the test code! I have flown half a dozen packs and qualitatively the motors sounded smoother and there was no twitchiness on full throttle punchouts. I will move back to 96kHz PWM and demag to low and test again. If no death rolls on full throttle punchouts I think we may have a fix!

hydra commented 4 years ago

@sskaug I flashed the Test code (regular one, not the 'B' one) to all 4 of the ESCs and appears to be fixed.

~Can you share details of the fix? Do you just clear the flag when reprogramming the timer or something else?~ EDIT: You did already, missed that.

Great work in fixing it too! When do you estimate you'll release the next version of BlHeli with this fix?

hydra commented 4 years ago

@sskaug Also, out of curiosity, why weren't you initially able to reproduce the issue and what changed so that you could see the issue?

hydra commented 4 years ago

@sskaug Seems like other MCUs are affected too - Issue also reported on the other thread here:

https://github.com/betaflight/betaflight/issues/9886#issuecomment-657829904

See my reply here: https://github.com/betaflight/betaflight/issues/9886#issuecomment-658677926

Can you provide test code with the same fix for the TMotor F35A 3-6S ESC.

I also have a spare TMotor F55A Pro II which also uses an F3 MCU so test code for that would be good too.

ChrisRosser commented 4 years ago

@sskaug Just tried to fly with 96kHz PWM and demag to low. Quad still deathrolls out of the sky on full throttle punchouts and has a loss of authority at zero throttle (lots of zero throttle bobbles and wobbles). So this is probably a different issue.

sskaug commented 4 years ago

@hydra. Thank you for testing - glad that you find the issue to be fixed. In my setups the issue did not cause any bidir dshot errors. And initially I did not spend enough time looking for the short intervals on the scope. But when I did (many) repetitive trigs on the scope like @Nico9n did, I occasionally found a short one.

The same fix will be implemented for all MCUs - there may be this issue with F051 and other MCUs too, although as the F051 is slower, the gap will not be as short.

The fix is clean and not MCU dependent, so it will be implemented for all codes. I hope to have a new test code out (32.7.2) before I go on vacation :) this weekend.

sskaug commented 4 years ago

@ChrisRosser. Yes, from the various descriptions around, I think there is more than one issue. There may be other more significant issues than the inconsistent delay. All other thing being perfect, the inconsistent delay should only cause some loss of rpm information.

hydra commented 4 years ago

@sskaug Does your scope not have a 'delay' trigger? On my Rigol I can set it to look for transitions that happen with a min/max duration. It's super easy to find if you have this feature.

Yes, I guess the timer clock will affect the gap on different MCUs, but good to know you're applying the fix everywhere.

There should probably be something added to the BF wiki stating that the minimum revision of BlHeli32 for BiDirectional DSHOT is 32.7.2 to help with users reporting similar issues.

bluehallu commented 4 years ago

I can confirm that with 96khz Tekko32 F3 45A B and bidirectional dshot the quad is a mess. Low throttle bobbles, general funky noises and death rolls on fast throttle increases.

sskaug commented 4 years ago

@hydra There are clearly other - probably more severe, maybe hardware related - issues around. If you check out issue 465 https://github.com/bitdump/BLHeli/issues/465, there is also a source of packet errors from packets arriving too late for BF. And if the system is sensitive to some small rate of packet loss, then it is really not robust enough. Rev32.7 has been around for a long time and is working well in lots of systems with bidirectional Dshot.

hydra commented 4 years ago

@sskaug yeah #456 is interesting and your conclusion there is probably correct. I'll leave that for the BF devs to fix (@joelucid @etracer @mikeller, @jflyper et al). I Agree with your thoughts on robustness too.

What I wonder is how many people try enabling BiDirectional DSHOT, find it doesn't work for them and don't report it, generally for every one person reporting an issue many more actually have the issue. Still, it should be improved regardless.

sskaug commented 4 years ago

Test code Rev32.7.2 with a fix for this is now published. Please put it to test!

hydra commented 4 years ago

Test code mentioned above is here: https://github.com/bitdump/BLHeli/tree/master/BLHeli_32%20ARM/Rev32.7.2%20SBUS%20and%20S.PORT%20testcode

hydra commented 4 years ago

@sskaug thanks for looking into and fixing this.