Drive mode selection: Refusing to stay in POSHOLD - keeps switching back to ALT_HOLD mode

pkmiles commented 8 months ago

Version: Beta 7

I have a DVL A50 installed. Select ALT_HOLD from the dropdown but it keeps switching back to ALT_HOLD mode.

rafaellehmkuhl commented 8 months ago

This is probably on the Ardupilot side, right @Williangalvani?

Peter, did you receive any message saying that the mode selection failed? If not, that should be something that we should add on Cockpit indeed. Even if we can't fix the problem on the vehicle side, we should be clear about it on the GCS.

Williangalvani commented 8 months ago

it does sound like the autopilot is not happy about it. but you should get a message, yes. Ardupilot will refuse to get into poshold if there is no valid position

rafaellehmkuhl commented 8 months ago

it does sound like the autopilot is not happy about it. but you should get a message, yes. Ardupilot will refuse to get into poshold if there is no valid position

Would it come as an ACK fail? If so, I think we are not treating it indeed.

Williangalvani commented 8 months ago

Would it come as an ACK fail? If so, I think we are not treating it indeed.

it should but I can't quite tell due to some mavlink/ardupilot shenanigans. check https://github.com/ArduPilot/ardupilot/pull/14452 😅

rafaellehmkuhl commented 8 months ago

Would it come as an ACK fail? If so, I think we are not treating it indeed.

it should but I can't quite tell due to some mavlink/ardupilot shenanigans. check ArduPilot/ardupilot#14452 😅

jesus christ 😆

pkmiles commented 8 months ago

...did you receive any message saying that the mode selection failed? If not, that should be something that we should add on Cockpit indeed. Even if we can't fix the problem on the vehicle side, we should be clear about it on the GCS.

Sorry for the slow response. No messages, just flicks back. We were out at the pool on Friday and didn't experience it then. I need to do a better job of tracking exactly what versions of things we're running in these tests.

rafaellehmkuhl commented 8 months ago

I've seen this kind of problem happen before. @ericjohnson97 had the exact same issue this weekend on his tests with an aerial vehicle. I suspect that it's related to a poor mavlink communication, but even in this case it should work reliably. We will investigate.

ericjohnson97 commented 8 months ago

I've seen this kind of problem happen before. @ericjohnson97 had the exact same issue this weekend on his tests with an aerial vehicle. I suspect that it's related to a poor mavlink communication, but even in this case it should work reliably. We will investigate.

For at least my setup I was sending the commands through a SIK radio and I think I was dealing with high packet loss. It might be time to implement #628. This would give us a sense if the command is getting rejected or if the packet was getting lost.

If it is a packet loss issue. It might be worth implementing some retry logic.

ericjohnson97 commented 8 months ago

it does sound like the autopilot is not happy about it. but you should get a message, yes. Ardupilot will refuse to get into poshold if there is no valid position

Would it come as an ACK fail? If so, I think we are not treating it indeed.

One feature that I use often in QGC is the status panel that tells you the health of the estimator as well as your sensors. the Estimator part is driven off of the status flags in https://mavlink.io/en/messages/common.html#ESTIMATOR_STATUS

This would tell you if your aircraft has the ability to go into a horizontal or vertical position aided mode.

rafaellehmkuhl commented 8 months ago

For at least my setup I was sending the commands through a SIK radio and I think I was dealing with high packet loss. It might be time to implement #628. This would give us a sense if the command is getting rejected or if the packet was getting lost.

If it is a packet loss issue. It might be worth implementing some retry logic.

I think this will be the correct solution to this issue, specially the retry logic (today we are just sending one command and hoping it gets to the vehicle).

I've added this issue to our beta->stable roadmap. If anyone wants to take it before we do, it will be much appreciated.

ericjohnson97 commented 8 months ago

I discovered that the interval updater to display the current mode is setting the currentMode variable back to the original mode right after the user changes it. And because changing the value of the currentMode variable triggers a the setFlightMode function the effect is there are 2 commands sent. 1 with the desired mode the user commanded and another command right after setting it back to the original mode which is triggered by the telemetry update. I believe that this issue doesn't show in sim because the mode change is registered fast enough that the next telemetry update already shows the mode switch. On real vehicles that have more latency, switching the mode doesn't happen fast enough to prevent the undesired second mode switch command.

https://github.com/bluerobotics/cockpit/blob/master/src/components/mini-widgets/ModeSelector.vue#L26

here is a screen show of some console logs I added to show this

I have toiled with how to best fix this and I can't come up with a solution I like. Ideally we would want to only call the setFlightMode function when the user changes the value of the dropdown, but I am not sure how best to build this. Thoughts?

rafaellehmkuhl commented 8 months ago

I discovered that the interval updater to display the current mode is setting the currentMode variable back to the original mode right after the user changes it. And because changing the value of the currentMode variable triggers a the setFlightMode function the effect is there are 2 commands sent. 1 with the desired mode the user commanded and another command right after setting it back to the original mode which is triggered by the telemetry update. I believe that this issue doesn't show in sim because the mode change is registered fast enough that the next telemetry update already shows the mode switch. On real vehicles that have more latency, switching the mode doesn't happen fast enough to prevent the undesired second mode switch command.

Daaaaaaamn! Good catch Eric!

I have toiled with how to best fix this and I can't come up with a solution I like. Ideally we would want to only call the setFlightMode function when the user changes the value of the dropdown, but I am not sure how best to build this. Thoughts?

I think we should rewrite the logic around this component. It first needs to use the ACK promise you're implementing on the other PR, and put the selector in a loading (disabled) state during that. If the ACK does not come after a certain amount of time (let's say 1 second), it should go back to the previous state without emiting a modeChange action (so in the case of latencies greater than 1 second, if the ACK ends up coming, we use it to alter the state). We should keep the value on the selector being updated with the value from the store (except during this waiting interval) so that if changes happen there, we eventually get to the right state.

ericjohnson97 commented 8 months ago

I think we should rewrite the logic around this component. It first needs to use the ACK promise you're implementing on the other PR, and put the selector in a loading (disabled) state during that. If the ACK does not come after a certain amount of time (let's say 1 second), it should go back to the previous state without emiting a modeChange action (so in the case of latencies greater than 1 second, if the ACK ends up coming, we use it to alter the state). We should keep the value on the selector being updated with the value from the store (except during this waiting interval) so that if changes happen there, we eventually get to the right state.

I was trying to think of a simple way to do it with out ack logic last night since listening to the ack is not fully implemented, but I agree that the ideal behavior is that the component should lock out more commands until it hears the result via ack or times out.

I can make these changes once #782 is finished

ericjohnson97 commented 8 months ago

I think we should rewrite the logic around this component. It first needs to use the ACK promise you're implementing on the other PR, and put the selector in a loading (disabled) state during that. If the ACK does not come after a certain amount of time (let's say 1 second), it should go back to the previous state without emiting a modeChange action (so in the case of latencies greater than 1 second, if the ACK ends up coming, we use it to alter the state). We should keep the value on the selector being updated with the value from the store (except during this waiting interval) so that if changes happen there, we eventually get to the right state.

I started playing around with a fix for this and its proving to be harder than I thought. I can't tell if it takes longer for the heartbeat to update with the new mode than it does for the autopilot to respond with and an ack or if the vue reactive variables are doing some weird queuing in the background, but I can't seem to be able to eliminate the problem by simply waiting for the acknowledge message. Maybe we should just have a couple second timeout where we don't allow updates after you send a change mode command?

rafaellehmkuhl commented 8 months ago

I started playing around with a fix for this and its proving to be harder than I thought. I can't tell if it takes longer for the heartbeat to update with the new mode than it does for the autopilot to respond with and an ack or if the vue reactive variables are doing some weird queuing in the background, but I can't seem to be able to eliminate the problem by simply waiting for the acknowledge message. Maybe we should just have a couple second timeout where we don't allow updates after you send a change mode command?

I think it's reasonable, at least as a hotfix, to improve the current experience. Can't think of a situation when someone needs to change the mode within less than, let's say, 5 seconds, and at the same time, if you have more than 5 seconds lag on the vehicle responses, it's probably impossible to pilot already.

bluerobotics / cockpit

Drive mode selection: Refusing to stay in POSHOLD - keeps switching back to ALT_HOLD mode #742