Open tobbeanton opened 8 months ago
Interesting, thank you for the detailed report. Did you by any chance try with 24kHz PWM setting too? Also have you tried increasing dead-time to 15?
cc @damosvil - do you have any input on this, or other things you want to see tested?
We recently tried 24KHz PWM and found the same issue, but now on phase C.
And here are the settings we used for this capture
Also have you tried increasing dead-time to 15?
Just in case you missed it.
Due to the rarity of this appearing I'm guessing it is a timing thing, probably some interrupt that triggers att just the right time, causing the PWM to be updated a bit separated in time.
Just in case you missed it.
I did not try dead-time 15. Would be strange if this was the cause but better be safe then sorry. Will try tomorrow.
It seems a problem related to setting the PCA registers, but checking Blheli_S and Bluejay source code it seems they are both setting those registers in the same place:
It seems that in some occasions Xp starts working a PWM cycle before Xc (updating Xp and Xc are not synchronized to the PWM cycle), something that agrees with the code. What I don't understand is why you cannot reproduce the same issue in Blheli_S, because both codebases do the same.
¿Have you found any pattern to reproduce this issue? ¿How frequent is it in your hw? ¿could you alternate one of the led GPIOs before updating the PCA registers and also scope it? - If you need a customized fw to do this let us know. ¿could you check if you can also reproduce this issue with Bluejay 0.16?
What I don't understand is why you cannot reproduce the same issue in Blheli_S, because both codebases do the same.
Let us try longer and perhaps we can replicate it in Blheli_S too.
could you check if you can also reproduce this issue with Bluejay 0.16
Yes we could reproduce it, and this time it happened in the middle of breaking, not in the change from breaking to accelerating.
could you alternate one of the led GPIOs before updating the PCA registers and also scope it? - If you need a customized fw to do this let us know.
We will give it a try
I have been talking with Alka (the creator of AM32) and he suggests that this might be a problem related to not using a gate driver like the fd6288, that implements shoot through prevention. He also said that ARM MCUs do complement the PWM in hardware so it seems it is an issue related tinywhoop hardware in general that uses EFM8BBx MCUs.
What we can do for the next version is not to update the PCA registers if the PCA counter is about to expire. This way Xp and Xc will be updated synchronized with the PCA cycle. This would fix the issue and I think it would not hit performance noticeably. Another solution, but only a mitigation would be to update first the low part of the power and damp registers and then update the high parts together, so the issue would probably happen a 50% less, this way not hitting performance.
To me, checking for PCA counter to expire, sound like the right way to do it. Since the auto-reload registers are used there is already a "performance" hit since it can take almost a full cycle before the PCA registers are updated. And I don't think there is any other safe way to do it.
It sounds a bit challenging to implement but we are happy to test it if you know how to do it @damosvil
Another thing I was thinking about, why we are not able to replicate it in Blheli_s 16.7. It could just be a coincident that we have not manage to catch it but we have tested for ~20min and for Blujay it usually happens within 1min. Could this be related to interrupt rather then the auto-reload registers?
To me, checking for PCA counter to expire, sound like the right way to do it. Since the auto-reload registers are used there is already a "performance" hit since it can take almost a full cycle before the PCA registers are updated. And I don't think there is any other safe way to do it.
It sounds a bit challenging to implement but we are happy to test it if you know how to do it @damosvil
Ok, I will try a modification and I will let you know
Another thing I was thinking about, why we are not able to replicate it in Blheli_s 16.7. It could just be a coincident that we have not manage to catch it but we have tested for ~20min and for Blujay it usually happens within 1min. Could this be related to interrupt rather then the auto-reload registers?
I have checked Blheli_S code again and I think that they do something to avoid the issue in the pca_int isr: https://github.com/bitdump/BLHeli/blob/ef8c1a0b644c228f07a82f3d25e6d581492eaacf/BLHeli_S%20SiLabs/BLHeli_S.asm#L1567
But I think that ISRs add additional latency so it would be better not to update the PWM registers if PCA counter is about to expire and reorder PCA register writes.
I have been checking EFMBB2 reference manual and it seems it may be not so easy to control when to load Xc and Xp registers: I will check Blheli_S solution again.
I think that a valid solution would be that, when a new dshot frame arrives, to store the power and damp values, and activate the PCA interrupt (generated when PCA counter is 0). In the interrupt we should set Xp and then Xc, so when the up edges happen both autoreload values are loaded in the same cycle, and disable the interrupt again. I will try to code this solution next week.
I think that a valid solution would be that, when a new dshot frame arrives, to store the power and damp values, and activate the PCA interrupt (generated when PCA counter is 0). In the interrupt we should set Xp and then Xc, so when the up edges happen both autoreload values are loaded in the same cycle, and disable the interrupt again. I will try to code this solution next week.
Sound good, I think this is a common way to handle it.
Just checking how things are going? Anything we can do to help (but doing the actual fix might be above our skill level)?
Hey, just a heads-up. We have not forgotten you, unfortunately we are currently a bit swamped with private life/work so things will take some time.
Thanks for letting us know! It might not be the easiest fix either! Meanwhile we might try the:
Another solution, but only a mitigation would be to update first the low part of the power and damp registers and then update the high parts together, so the issue would probably happen a 50% less, this way not hitting performance.
This we could probably manage ourselves.
@tobbeanton thank you, please let us know how it goes - if it works, we would appreciate a PR.
What I don't understand is why you cannot reproduce the same issue in Blheli_S, because both codebases do the same.
Let us try longer and perhaps we can replicate it in Blheli_S too.
could you check if you can also reproduce this issue with Bluejay 0.16
Yes we could reproduce it, and this time it happened in the middle of breaking, not in the change from breaking to accelerating.
could you alternate one of the led GPIOs before updating the PCA registers and also scope it? - If you need a customized fw to do this let us know.
We will give it a try
This capture was from 0.16, correct?
Can you confirm what version(s) the previous 2 captures were? https://github.com/bird-sanctuary/bluejay/issues/187#issue-2181610116 https://github.com/bird-sanctuary/bluejay/issues/187#issuecomment-1991762707
Is the timing of the bug exactly the same for each occurrence on the same version, or is there some variation? How many samples?
What variation did you see between 0.19.2 and 0.21RC (0.20.1?) Are you able to provide some instances from the missing version please?
Is there a fix in 0.21RC? Else the bug has more or less been fully identified...?
Is there a fix in 0.21RC? Else the bug has more or less been fully identified...?
No, I was more curious as to what the difference was in timing between 0.21 and .19.2 (if any) at the same PWM frequency
I think the bug has been there for a long time, since the PCA switching code was changed.
Hello! Was the bug fixed in the lastest version ?
@alinneacsu No, otherwise we would have closed the issue and mentioned in the release notes. Are you experiencing the same issues?
@stylesuxx
I think the issue can be similar: my setup includes a FC, based on H743, running Arducopter (bidir dshot enabled) and a 4in1 ESC running BlueJay latest version. Rarely, until now the rate is 1:50 flights, one of the motors simply turns off in flight, but it looks like it is not demag/desync, based on logs. Tried both 24Khz / 48Khz versions, no differences.
Didn't identified any way to replicate the issue, in a controlled environment.
I have many logs indicating this situation, i'm attaching a simple screenshot for now, the constant RPM at the end of the log indicates the moment when the motor stopped:
I'm logging also the following EDT fields: .SS -> EDT Stress Level (120 constantly) .SA -> EDT Status (193, rarely goes to 1)
Please attach full logs, so people can look through them.
@alinneacsu can you provide some time stamps of interest for those logs please?
Also, what else have you done to troubleshoot this issue? Does it always happen with the same motor? Have you tried to change timings?
The initial issue seems to be reproducable pretty consistently at least at this one setup. So I am not sure if we are looking at the same issue here.
Describe the issue
We are testing Bluejay for our upcoming Crazyflie 2.1 - brushless. As part of this we do autonomous flight testing over and over again, we jokingly call it infinite flight test. What we have noticed is that sometimes it just resets in mid air and we directly suspected the ESC. We built a small test rig where we can measure the mosfet signals as well as the battery voltage and cycle the PWM 10% and 100% every 300ms. This way we managed to capture the voltage dip and find a H-bridge shot-though condition. This usually happens within a minute using this test setup. As can be seen in the image Bc and Bp mosfet signals are both on for a short period of time causing the shot-through. This happens in the transition from breaking to accelerating where it looks like Bp is one PWM cycle late (or Bc early). I'm pretty sure this appens for the other phases as well but I don't have a capture of it.
We tried BLHeli_S 16.7 on which we could not detect the shot-though condition.
The full capture is attached and can be viewed using the Salae Logic. Shot-through-mosfet-channels.zip
Bluejay version
0.19.2 & 0.20.1-RC2
ESC variant
O_H_10
PWM frequency
48
DShot bitrate
300
Bidirectional DShot
Off
FC firmware
Crazyflie 2024.2
Motor size
08028
Configurator debug log
No response