Open bazfp opened 3 years ago
A suitable change is to replace the simple assignment of float NaN to int with a nan check:
In convert_COMMAND_LONG_to_COMMAND_INT
at GCS_Common.cpp
out.x = isnan(in.param5) ? 0 : in.param5;
out.y = isnan(in.param6) ? 0 : in.param6;
Send a MAVLINK message of
COMMAND_LONG
of command type 520, with param5 and param6 set to NaN and ardupilot will segfault.
Thanks for finding this!
Program received signal SIGFPE, Arithmetic exception.
Not a segfault :-)
MAVLINK standard specifies that NaN is a suitable value for messages.
It does - but that doesn't make it a valid value in all fields.
out.x = isnan(in.param5) ? 0 : in.param5; out.y = isnan(in.param6) ? 0 : in.param6;
That might be a reasonable in some cases, however I believe in the vast majority of cases it wouldn't be appropriate. Let's be charitable and assume there's a bug in a GCS which hands us NaN each for a latitude/longitude. In this case we should reject the command (unless the command specifies some behaviour if NaN is supplied...), rather than flying the vehicle to 0,0 (or, depending on command, current location).
As a first step we should probably make the funciton you called out either bool
or MAV_RESULT
and return failure in the case NaN is supplied in any field we look at. That can be passed back to the GCS. AFAIK we don't accept NaN at all, in any mavlink message we accept for processing.
One thing to remember here - we don't have an infinite amount of flash (often down to <2kB remaining on some boards), so we do need to carefully consider the implications of bugs and whether the sanity checks are worth it. Protecting in this function is probably worth the bytes. Even then - on an actual flight controller we don't actually catch floating-point-exceptions so I believe these will naturally become zero anyway. We catch FPEs in SITL to catch things like this!
Do you feel like putting a PR together with my above suggestion?
Oh - on that "not a segfault" thing - we do actually have a mis-feature in ArduPilot where we term things "segfaults" where they're really not - PRs welcome on that, too :-)
Thank you for your detailed reply, it's good to talk to someone about MAVLink :)
As a first step we should probably make the funciton you called out either
bool
orMAV_RESULT
and return failure in the case NaN is supplied in any field we look at. That can be passed back to the GCS. AFAIK we don't accept NaN at all, in any mavlink message we accept for processing.
I found this while looking for a suitable C++ MAVLink library and testing the MAVSDK. While they are not willing to support Ardupilot, I was looking to see if Ardupilot could instead support MAVSDK. MAVSDK blanks out command_longs and other messages with NaNs on construction. The param5&6 property is unused on command 520 and as such are sent as NaN.
Do you feel like putting a PR together with my above suggestion?
I'm happy to contribute but I'm still unclear on the approach. Outright rejecting NaNs in param5 and 6 (which may go unused) seems inappropriate. I think it's the almost blanket conversion to INT message type which is the root problem.
As a first step we should probably make the funciton you called out either
bool
orMAV_RESULT
and return failure in the case NaN is supplied in any field we look at. That can be passed back to the GCS. AFAIK we don't accept NaN at all, in any mavlink message we accept for processing.I found this while looking for a suitable C++ MAVLink library and testing the MAVSDK. While they are not willing to support Ardupilot, I was looking to see if Ardupilot could instead support MAVSDK. MAVSDK blanks out command_longs and other messages with NaNs on construction. The param5&6 property is unused on command 520 and as such are sent as NaN.
Well, that's definitely what the standard says - 0 or NaN. That language only came in in August last year on master.
We probably goofed when we merged that PR - either missing the comment or not understanding its implications.
The PR against ArduPilot was here: https://github.com/ArduPilot/mavlink/pull/132
and the language is
NaN and INT32_MAX may be used in float/integer params (respectively) to indicate optional/default values (e.g. to use the component's current yaw or latitude rather than a specific value).
I quite like the concept of using NaNs where possible. I also know @tridge is not a fan of that, however, preferring (IIUC) using a common flag value (i.e. 0) regardless of data type used for value transport.
I'm happy to contribute but I'm still unclear on the approach. Outright rejecting NaNs in param5 and 6 (which may go unused) seems inappropriate. I think it's the almost blanket conversion to INT message type which is the root problem.
Canonicalising the type to LONG means we can consolidate the code - having handlers for both INT and LONG cases would generally increase our flash uses. When I wrote that stuff I considered the INT command to essentially be a superset of the functionality of the LONG case.
I'm going to mark this for discussion at DevCall to see what our approach is going to be on this matter. Outcomes from these discussions frequently surprise me - but it might very well be we just revert that documentation change! (@hamishwillee for visibility)
On the subject of MAVSDK and ArduPilot - I actually did a port which allowed it to work with ArduPilot a couple of years back. My understanding is the SDK has been overhauled a couple of times since then, so the patches wouldn't be relevant - even if I could find them, which I can't! Problems I can remember are lack of support for some mavlink messages, the traditional mission-handling problems (slot-0-is-home-in-ArduPilot), parameter transfer type issues. You might consider other alternatives - dronekit-python is not getting any work done on it but works for a lot of people. Using the raw mavlink bindings works for a lot of people. Out autotest suite is an example of using the raw bindings (and @khancyr pulled a working end-to-end example out but I can't find it right now). ArduPilot's onboard LUA scripting may also be an option if you're willing to forgo cross-platform compatibility.
Thank you again for your reply on this topic.
I understand the concern with keeping flash consumption low. I'm afraid I cannot help much with that!
I've roughly patched the NaN issue here and MAVSDK functions; but still doesn't really work with ArduPilot. Arming works but it seems to be waiting for specific parameters before it will acknowledge a healthy autopilot and fully armed state. MAVSDK seems too much of a of a px4 blackbox to me :)
I've been investigating suitable C++ MAVLink libraries to provide offboard/companion control and have unfortunately come up short in terms of cross-autopilot support.
MAVROS was looking good until it's very lengthy list of build dependencies and conveluted plugin build system made it suddenly unappealing.
It's likely I will go my own way with the raw headers as I've written a templated wrapper to provide deduced subscribe/publish/ack support. I'm sure the real hardwork will begin when I look into implementing microservices for all the autopilots - and suitably abstracting them. I wish this was a core goal for MAVSDK but at this point it's easier for me to start fresh than refactor their single autopilot target code!
Happy to look at this on Weds. A couple of notes:
I'm going to mark this for discussion at DevCall to see what our approach is going to be on this matter. Outcomes from these discussions frequently surprise me - but it might very well be we just revert that documentation change!
The best way to put a position to the dev call is to attend the dev call. It's a bunch of people doing what seems sensible based on our not-all-encompassing experience. I'll try find the reason for that direction, but if you attend we can certainly have a more "rounded" discussion.
Arming works but it seems to be waiting for specific parameters before it will acknowledge a healthy autopilot and fully armed state. MAVSDK seems too much of a of a px4 blackbox to me :)
I've heard similar from the MAVSDK team. This is mainly because there are things that MAVLink does not standardise (which it should) so everyone does their own thing for those bits. I strongly suggest you raise any questions and ask for support on the PX4 slack #mavsdk channel phrased as "I would like to make this work with ArduPilot". I know that there is strong interest resolving the incompatibilities, but it has to come with "pull" and testing from the ArduPilot community.
On Sun, 2 May 2021, Hamish Willee wrote:
I'm going to mark this for discussion at DevCall to see what our approach is going to be on this matter. Outcomes from these discussions frequently surprise me - but it might very well be we just revert that documentation change!
The best way to put a position to the dev call is to attend the dev call. It's a bunch of people doing what seems sensible based on our experience. I'll try find the reason for that direction, but if you attend we can certainly have a more "rounded" discussion.
I was refering to the ArduPilot DevCall :-)
I was referring to the ArduPilot DevCall :-)
Ha! Better not to revert text downstream without having the discussion upstream. We don't change these things for the fun of it - there will have been a problem that still needs to be solved.
On Sun, 2 May 2021, Hamish Willee wrote:
Ha! Better not to revert text downstream without having the discussion upstream. We don't change these things for the fun of it - there will have been a problem that still needs to be solved.
Ah, but the first move is to try to convince people that the new wording is good and we should adapt ;-)
Bug report
Send a MAVLINK message of
COMMAND_LONG
of command type 520, with param5 and param6 set to NaN and ardupilot will segfault.Ardupilot is naively converting the
COMMAND_LONG
to aCOMMAND_INT
and trying to assign a NaN to an integer.MAVLINK standard specifies that NaN is a suitable value for messages.
https://github.com/ArduPilot/ardupilot/blob/master/libraries/GCS_MAVLink/GCS_Common.cpp#L4286
Example message
Version Tested on 4.0.7 and 4.0.3
Platform [ X ] All [ ] AntennaTracker [ ] Copter [ ] Plane [ ] Rover [ ] Submarine
Airframe type SITL / QUAD
Hardware type SITL
Logs