ArduPilot / ardupilot

ArduPlane, ArduCopter, ArduRover, ArduSub source
http://ardupilot.org/
GNU General Public License v3.0
10.81k stars 17.27k forks source link

Copter: SITL downs when i try to use own additional code #9175

Closed SwiftGust closed 6 years ago

SwiftGust commented 6 years ago

Issue details When I try to run SITL with Gazebo with own control code included, ArduCopter SITL downs, I can't find where this error is defined using Atom's project-wide search of ardupilot codebase

EOF on TCP socket Attempting reconnect [Errno 111] Connection refused sleeping [Errno 111] Connection refused sleeping

I need help on why I get this message when I use an additional algorithm that I'm working on, which is running on 400 Hz fast loop with an additional loop. This algorithm is some optimal control with matrix operation(2d array) I made this that can be switchable and tested lots of ways maximum iteration was only 2, but this algorithm requires least 10~20 iteration to converge to obtain the optimal solution so I want more

Is this due to the control algorithm is somewhat heavy so is it messing with the scheduler? I want to know what causes this problem such as memory lack or computation load increase? and want to know how to check what is the cause, If so how should I come around to make it work?

Thank you in advance.

Version APM:Copter V3.6-dev

Platform [ ] All [ ] AntennaTracker [V] Copter [ ] Plane [ ] Rover [ ] Submarine

Airframe type Copter - Hexa

Hardware type SITL

peterbarker commented 6 years ago

On Sat, 11 Aug 2018, Seunghwan Jo wrote:

When I try to run SITL with Gazebo with own control code included, ArduCopter SITL downs, I can't find where this error is defined using Atom's project-wide search of ardupilot codebase

EOF on TCP socket Attempting reconnect [Errno 111] Connection refused sleeping [Errno 111] Connection refused sleeping

That's typical of ArduPilot crashing. Have you modified ArduPilot? If so, you want to go and read the Wiki on using gdb to debug ArduPilot - if I'm right it will show something like a segmentation fault in the AP binary.

Is this due to the control algorithm is somewhat heavy so is it messing with the scheduler?

Unlikely - the EOF would more likely indicate the AP binary blowing up.

SwiftGust commented 6 years ago

@peterbarker Thank you for reply.

I followed GDB document on WIKI, with additional args sim_vehicle.py -v ArduCopter -f gazebo-hexa -D -G I'm not sure I made GDB working correct, it's quiet after the additional console output even SITL downs

RiTW: Starting ArduCopter (gdb) : gdb -x /tmp/tmpXx9LUq --args /home/jmarple/ardupilot/build/sitl-debug/bin/arducopter -S -I0 --home -35.363261,149.165230,584,353 --model gazebo-hexa --speedup 1 --defaults /home/jmarple/ardupilot/Tools/autotest/default_params/gazebo-hexa.parm

and using the attaching way with sim_vehicle.py -v ArduCopter -f gazebo-hexa -D and in a new console with jmarple@jmarple-ubuntu:~/ardupilot/build$ sudo gdb sitl-debug/bin/arducopter 5818

attaching gdb downs SITL process, and getting back up when i quit GDB

APM: EKF2 IMU0 Origin set to GPS APM: EKF2 IMU1 Origin set to GPS APM: EKF2 IMU0 is using GPS APM: EKF2 IMU1 is using GPS Flight battery 100 percent no link link 1 down no link no link no link no link no link no link no link no link no link no link no link no link no link no link APM: EKF2 IMU0 has stopped aiding APM: EKF2 IMU1 has stopped aiding Got MAVLink msg: COMMAND_ACK {command : 519, result : 3} link 1 OK height -120 heartbeat OK height 0

on GDB console, gives me output like below

Attaching to program: /home/jmarple/ardupilot/build/sitl-debug/bin/arducopter, process 5818 Reading symbols from /usr/lib/x86_64-linux-gnu/libstdc++.so.6...(no debugging symbols found)...done. Reading symbols from /lib/x86_64-linux-gnu/libm.so.6...Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/libm-2.23.so...done. done. Reading symbols from /lib/x86_64-linux-gnu/libgcc_s.so.1...(no debugging symbols found)...done. Reading symbols from /lib/x86_64-linux-gnu/libpthread.so.0...Reading symbols from /usr/lib/debug/.build-id/ce/17e023542265fc11d9bc8f534bb4f070493d30.debug...done. done. [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". Reading symbols from /lib/x86_64-linux-gnu/libc.so.6...Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/libc-2.23.so...done. done. Reading symbols from /lib64/ld-linux-x86-64.so.2...Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/ld-2.23.so...done. done. ---Type to continue, or q to quit---return 0x00007fe8f44072f0 in __nanosleep_nocancel () at ../sysdeps/unix/syscall-template.S:84 84 ../sysdeps/unix/syscall-template.S: No such file or directory.

Beside this, I suspected computation load so I flashed my firmware on NAVIO2, unlike SITL, it runs without crashing even with maximum 100 iteration (didn't tested with actual copter yet)

peterbarker commented 6 years ago

On Wed, 15 Aug 2018, Seunghwan Jo wrote:

  APM: EKF2 IMU0 Origin set to GPS
  APM: EKF2 IMU1 Origin set to GPS
  APM: EKF2 IMU0 is using GPS
  APM: EKF2 IMU1 is using GPS
  Flight battery 100 percent
  no link
  link 1 down
  no link
  no link
  no link
  no link
  no link
  no link
  no link
  no link
  no link
  no link
  no link
  no link
  no link
  no link
  APM: EKF2 IMU0 has stopped aiding
  APM: EKF2 IMU1 has stopped aiding
  Got MAVLink msg: COMMAND_ACK {command : 519, result : 3}
  link 1 OK
  height -120
  heartbeat OK
  height 0

OK, that does look like something has gone out to lunch. You might want to look at moving your computation to a thread to avoid annoying the main flight loop.

Beside this, I suspected computation load so I flashed my firmware on NAVIO2, unlike SITL, it runs without crashing even with maximum 100 iteration (didn't tested with actual copter yet)

You might not notice it on the ground, but you might in the air!

Check out the dataflash "PM" message to ensure your loop times aren't extreme.

SwiftGust commented 6 years ago

I didn't notices there are log for compuation performance, thank you for the info.

I looked at PM log of SITL with new algorithm and flight log of copter which flew well, In SITL log, NLoops are fixed to 400 which looks fine, but NLon is always equal or over 1 and max time says otherwise, it takes 4999 (units are probably in microseconds? then its number of loop should've been halved...)

In the flight log, max time varies, but in range of about 2800 ~ 3300, barely shows NLon to 1 or more.

I don't know how SITL works to simulate embedded hardware like pixhawk, is it safe to assume this cannot work well in cortex m4 such as pixhawk?? or it looks weird that it cannot even run well in desktop...

peterbarker commented 6 years ago

On Wed, 15 Aug 2018, Seunghwan Jo wrote:

I don't know how SITL works to simulate embedded hardware like pixhawk, or it looks weird that it cannot even run well with desktop...

Yeah, only useful from real hardware :-)

SwiftGust commented 6 years ago

I think I misguided, flight log I meant was another flight data without this algorithm. I want to make this could be tested on SITL since it could be very dangerous, then multi-threading is the only solution? and I also wanna know is SITL running free with x86 CPUs or is it emulating ARM cores? Thanks.

peterbarker commented 6 years ago

On Thu, 16 Aug 2018, Seunghwan Jo wrote:

I think I misguided, flight log I meant was another flight data without this algorithm. I want to make this could be tested on SITL since it could be very dangerous, then multi-threading is the only solution? and I also wanna know is SITL running free with x86 CPUs?

There are options apart from running your code in a different thread; an "anytime" algorithm like we used for SmartRTL, for example.

You do not need to actually fly your code to test it on embedded hardware. Remove props before arming your vehicle, run it through its paces. Stare at logs before arming with props on...

Peter

rmackay9 commented 6 years ago

probably this can be moved to gitter or the developer discuss group so I'll close this if that's OK.