ArduPilot / ardupilot

ArduPlane, ArduCopter, ArduRover, ArduSub source
http://ardupilot.org/
GNU General Public License v3.0
10.81k stars 17.27k forks source link

SITL build on a Raspberry Pi with X-Plane results in "Bus error" #6035

Open omcaree opened 7 years ago

omcaree commented 7 years ago

Issue details

Setup: SITL build running on a Raspberry Pi (Navio2 SD card image), X-Plane running on a networked machine.

SITL build starts, Mission Planner connects successfully, then X-Plane connects, then a "Bus error" occurs immediately. See below

pi@navio:~/ardupilot $ build/sitl/bin/arduplane --model xplane
Waiting for XPlane data on UDP port 49001 and sending to port 49000
Started model xplane at -35.363261,149.165230,584,353 at speed 1.0
Starting sketch 'ArduPlane'
Starting SITL input
Using Irlock at port : 9005
bind port 5760 for 0
Serial port 0 on TCP port 5760
Waiting for connection ....
bind port 5762 for 2
Serial port 2 on TCP port 5762
bind port 5763 for 3
Serial port 3 on TCP port 5763
Connected to 10.0.0.1:49000
Bus error

It's not particularly clear which "Bus" this is referring to, and a quick search of the source finds no such error message anywhere.

Other SITL models (not X-Plane) work fine, and X-Plane model works fine on a normal desktop.

Version

Plane 3.7.1 and master

Platform

[ X ] All [ ] AntennaTracker [ ] Copter [ ] Plane [ ] Rover

OXINARF commented 7 years ago

@omcaree This list is for confirmed bugs or feature requests. Developer questions/issues should be posted in the forum (http://discuss.ardupilot.org) or asked about in Gitter (http://gitter.im/ArduPilot/ardupilot).

I'll ask @tridge to see if he has any idea.

omcaree commented 7 years ago

Apologies, I should have made it clear that the SITL executable exits immediately following this error, so I think that qualifies add a bug.

OXINARF commented 7 years ago

Unlikely. As you said that message doesn't come from ArduPilot, it is most likely a message from the OS. I'm guessing we are trying to open a port or a device that you don't have - if that is the case this is a misconfiguration, not exactly a bug. And I said, the list is for confirmed bugs (otherwise all users will open issues saying they've found a bug :wink:).

Usually we close this immediately but since you provided all the information, I'll give some time for @tridge to give a comment.

omcaree commented 7 years ago

I've done some digging and found the problem. The following warning crops up when building on ARM which isn't there with an x86 build.

../../libraries/SITL/SIM_XPlane.cpp:141:44: warning: cast from ‘uint8_t* {aka unsigned char*}’ to ‘const float*’ increases required alignment of target type [-Wcast-align]
         const float *data = (const float *)p;
                                            ^

This cast is not memory alignment safe, which is a requirement on ARM. The first access of p is at byte 5 of a uint8_t array (see line 104 of libraries/SITL/SIM_XPlane.cpp for definition of p), casting p to a float pointer leads to unaligned access as soon as the data array is read (ARM requires floats arrays to be aligned to 4 byte boundaries). This unaligned access is whats causing SIGBUS to kill the process.

If the cast was to another integer type then adding the __packed modifier should solve this issue, but as float arrays don't accept __packed I've worked around it with a memcpy instead of a cast.

I'd be happy to submit a patch if this is helpful.

OXINARF commented 7 years ago

@omcaree Pull requests are welcome, it will be reviewed and merged if no problem is found in it.

lucasdemarchi commented 7 years ago

@omcaree nice catch! There's a strange thing in this code:

const float *data = (const float *)p;
uint8_t code = p[0];

it considers the first byte on this to be the code, but also the start of a float array? Either that or it's too late here and there's no coffee for me anymore.

This should instruct the compiler to generate code for unaligned access without requiring the huge memcpy: https://github.com/lucasdemarchi/ardupilot/commit/c032b0940fb6ff87e9ed1764962ccb950491756e (built tested only on my PC)

@tridge ?