PX4 / PX4-Autopilot

PX4 Autopilot Software
https://px4.io
BSD 3-Clause "New" or "Revised" License
7.97k stars 13.27k forks source link

Fail to simulate with gazebo on Ubuntu 20.04: Segmentation fault (core dumped) gzserver $verbose $world_path $ros_args #22958

Open lipantao opened 3 months ago

lipantao commented 3 months ago

Describe the bug

I'm trying to run make px4_sitl gazebo but get the 'core dumped' error: image I tried to fix it by make distclean and git submodule update --init --recursive , and it succeeded twice by accident, but running SITL again gave the same failed result with the same error information. The gazebo can run perfectly standalone.

To Reproduce

I followed the instructions of the PX4 User Guide to install and configure, but got the error as described above. All px4 commands starting SITL with gazebo-classic fail with the same core dumped error.

Expected behavior

I expect PX4 to connect to gazebo normally and successfully run SITL.

Screenshot / Media

No response

Flight Log

lw@lw-System-Product-Name:~/PX4-Autopilot$ make px4_sitl gazebo [0/4] Performing build step for 'sitl_gazebo-classic' ninja: no work to do. [3/4] cd /home/lw/PX4-Autopilot/build/px4_sitl_default/src/modules...ome/lw/PX4-Autopilot /home/lw/PX4-Autopilot/build/px4_sitl_default SITL ARGS sitl_bin: /home/lw/PX4-Autopilot/build/px4_sitl_default/bin/px4 debugger: none model: iris world: none src_path: /home/lw/PX4-Autopilot build_path: /home/lw/PX4-Autopilot/build/px4_sitl_default GAZEBO_PLUGIN_PATH :/home/lw/PX4-Autopilot/build/px4_sitl_default/build_gazebo-classic GAZEBO_MODEL_PATH :/home/lw/PX4-Autopilot/Tools/simulation/gazebo-classic/sitl_gazebo-classic/models LD_LIBRARY_PATH /home/lw/catkin_ws/devel/lib:/opt/ros/noetic/lib:/opt/ros/noetic/lib/x86_64-linux-gnu:/home/lw/PX4-Autopilot/build/px4_sitl_default/build_gazebo-classic empty world, setting empty.world as default Using: /home/lw/PX4-Autopilot/Tools/simulation/gazebo-classic/sitl_gazebo-classic/models/iris/iris.sdf Warning [parser.cc:833] XML Attribute[version] in element[sdf] not defined in SDF, ignoring. /home/lw/PX4-Autopilot/Tools/simulation/gazebo-classic/sitl_run.sh: line 147: 267253 Segmentation fault (core dumped) gzserver $verbose $world_path $ros_args SITL COMMAND: "/home/lw/PX4-Autopilot/build/px4_sitl_default/bin/px4" "/home/lw/PX4-Autopilot/build/px4_sitl_default"/etc


| \ \ \ / / / | | |/ / \ V / / /| | | / / \ / /_| | | | / /^\ \ __ | _| \/ \/ |/

px4 starting.

INFO [px4] startup script: /bin/sh etc/init.d-posix/rcS 0 INFO [init] found model autostart file as SYS_AUTOSTART=10015 INFO [param] selected parameter default file parameters.bson INFO [param] selected parameter backup file parameters_backup.bson SYS_AUTOCONFIG: curr: 0 -> new: 1 SYS_AUTOSTART: curr: 0 -> new: 10015 CAL_ACC0_ID: curr: 0 -> new: 1310988 CAL_GYRO0_ID: curr: 0 -> new: 1310988 CAL_ACC1_ID: curr: 0 -> new: 1310996 CAL_GYRO1_ID: curr: 0 -> new: 1310996 CAL_ACC2_ID: curr: 0 -> new: 1311004 CAL_GYRO2_ID: curr: 0 -> new: 1311004 CAL_MAG0_ID: curr: 0 -> new: 197388 CAL_MAG0_PRIO: curr: -1 -> new: 50 CAL_MAG1_ID: curr: 0 -> new: 197644 CAL_MAG1_PRIO: curr: -1 -> new: 50 SENS_BOARD_X_OFF: curr: 0.0000 -> new: 0.0000 SENS_DPRES_OFF: curr: 0.0000 -> new: 0.0010 INFO [dataman] data manager file './dataman' size is 7872608 bytes INFO [init] PX4_SIM_HOSTNAME: localhost INFO [simulator_mavlink] Waiting for simulator to accept connection on TCP port 4560 Gazebo multi-robot simulator, version 11.14.0 Copyright (C) 2012 Open Source Robotics Foundation. Released under the Apache 2 License. http://gazebosim.org

[Msg] Waiting for master. [Err] [ConnectionManager.cc:121] Failed to connect to master in 30 seconds. [Err] [gazebo_shared.cc:78] Unable to initialize transport. [Err] [gazebo_client.cc:56] Unable to setup Gazebo

Software Version

Gazebo 11.14.0, Ubuntu 20.04

Flight controller

None

Vehicle type

None

How are the different components wired up (including port information)

No response

Additional context

No response

lipantao commented 3 months ago

image

julianoes commented 3 months ago

I wonder if it is this one?

 Warning [parser.cc:833] XML Attribute[version] in element[sdf] not defined in SDF, ignoring.

Otherwise, can you try to open the core dump? Or start with gdb by replacing gzserver in https://github.com/PX4/PX4-Autopilot/blob/ccdf0603931e513fb96f3f5b8a4240e51b7c3122/Tools/simulation/gazebo-classic/sitl_run.sh#L104 with gdb -ex run --args gzserver

lipantao commented 3 months ago

Thanks for your reply! I think it is not the XML warning issue, because this warning also occurs when the simulation occasionally runs successfully, but it does not affect the operation. The only difference between success and failure is the core dumped issue. I followed your suggestion and started with gdb, the output is as follows: image It seems that something wrong occurred with Boost.Asio library. I reinstalled the Boost library and gazebo but it still doesn't work. image

roseyanpeng commented 3 months ago

I also encountered a similar problem. I have now determined that the problem lies in "<plugin name='mavlink_interface' filename='libgazebo_mavlink_interface.so>". After further checking its code, I found that there is a problem with the code ""mavlinkinterface = std::make_unique". Can anyone else help check this issue? Thank you very much!

julianoes commented 3 months ago

Make sure to try to get more verbose output:

export VERBOSE_SIM=1
lipantao commented 3 months ago

Make sure to try to get more verbose output:

export VERBOSE_SIM=1

I set VERBOSE_SIM=1 and the output is: image image

julianoes commented 3 months ago

Would be good if you could type backtrace when it segfaults to get the full backtrace.

lipantao commented 3 months ago

Would be good if you could type backtrace when it segfaults to get the full backtrace.

Thank you again for your patient and detailed reply! I ran make px4_sitl_default gazebo-classic_iris_gdb and backtrace, and got output: [Msg] Waiting for master. [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". [New Thread 0x7ffff7a19700 (LWP 3876917)] [New Thread 0x7ffff7fc6700 (LWP 3876918)] [New Thread 0x7ffff7218700 (LWP 3876919)]

Thread 3 "px4" received signal SIG32, Real-time event 32. [Switching to Thread 0x7ffff7fc6700 (LWP 3876918)] __lll_lock_wait_private (futex=0x7ffff7fc6d18) at lowlevellock.c:35 35 lowlevellock.c: No such file or directory. (gdb) bt #0 __lll_lock_wait_private (futex=0x7ffff7fc6d18) at lowlevellock.c:35 #1 0x00007ffff7f6b7b7 in start_thread (arg=<optimized out>) at pthread_create.c:453 #2 0x00007ffff7b3e353 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 (gdb) quit

julianoes commented 3 months ago

Hmm, I think that's the backtrace of PX4 but we need the one of gzserver.

lipantao commented 3 months ago

Hmm, I think that's the backtrace of PX4 but we need the one of gzserver.

I think i obtained a backtrace for gzserver with the command gdb -ex run -ex "bt" --args gzserver $verbose $world_path $ros_args & in the sitl_run.sh: image And this is the text of the above output screenshot: --Type <RET> for more, q to quit, c to continue without paging--Thread 53 "gzserver" received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x7fff517f7700 (LWP 22215)] boost::asio::detail::reactive_descriptor_service::reactive_descriptor_service (context=..., this=0x7ffef8f2ddb8) at /usr/local/include/boost/asio/detail/impl/reactive_descriptor_service.ipp:39 39 reactor_.init_task(); #0 boost::asio::detail::reactive_descriptor_service::reactive_descriptor_service(boost::asio::execution_context&) (context=..., this=0x7ffef8f2ddb8) at /usr/local/include/boost/asio/detail/impl/reactive_descriptor_service.ipp:39 #1 boost::asio::detail::posix_serial_port_service::posix_serial_port_service(boost::asio::execution_context&) (context=..., this=0x7ffef8f2dd90) at /usr/local/include/boost/asio/detail/impl/posix_serial_port_service.ipp:36 #2 boost::asio::detail::service_registry::create<boost::asio::detail::posix_serial_port_service, boost::asio::io_context>(void*) (owner=owner@entry=0x7ffef89ff740) at /usr/local/include/boost/asio/detail/impl/service_registry.hpp:87 #3 0x00007fffc427ce65 in boost::asio::detail::service_registry::do_use_service(boost::asio::execution_context::service::key const&, boost::asio::execution_context::service* (*)(void*), void*) (owner=0x7ffef89ff740, factory=0x7fffc42874b0 <boost::asio::detail::service_registry::create<boost::asio::detail::posix_serial_port_service, boost::asio::io_context>(void*)>, key=<synthetic pointer>..., this=0x7ffef8f2c370) at /usr/local/include/boost/asio/detail/impl/service_registry.ipp:132 #4 boost::asio::detail::service_registry::use_service<boost::asio::detail::posix_serial_port_service>(boost::asio::io_context&) (owner=..., this=0x7ffef8f2c370) at /usr/local/include/boost/asio/detail/impl/service_registry.hpp:39 #5 boost::asio::use_service<boost::asio::detail::posix_serial_port_service>(boost::asio::io_context&) (ioc=...) at /usr/local/include/boost/asio/impl/io_context.hpp:41 #6 boost::asio::detail::io_object_impl<boost::asio::detail::posix_serial_port_service, boost::asio::any_io_executor>::io_object_impl<boost::asio::io_context>(int, int, boost::asio::io_context&) (context=..., this=0x7ffef89ff750) at /usr/local/include/boost/asio/detail/io_object_impl.hpp:58 #7 boost::asio::basic_serial_port<boost::asio::any_io_executor>::basic_serial_port<boost::asio::io_context>(boost::asio::io_context&, boost::asio::constraint<std::is_convertible<boost::asio::io_context&, boost::asio::execution_context&>::value, boost::asio::defaulted_constraint>::type) (context=..., this=0x7ffef89ff750) at /usr/local/include/boost/asio/basic_serial_port.hpp:120 #8 MavlinkInterface::MavlinkInterface() (this=0x7ffef89ef5d0) at /home/lw/PX4-Autopilot/Tools/simulation/gazebo-classic/sitl_gazebo-classic/src/mavlink_interface.cpp:4 #9 0x00007fffc423e252 in std::make_unique<MavlinkInterface>() () at /usr/include/eigen3/Eigen/src/Core/util/Memory.h:170 #10 gazebo::GazeboMavlinkInterface::GazeboMavlinkInterface() (this=0x7ffef896dba0) at /home/lw/PX4-Autopilot/Tools/simulation/gazebo-classic/sitl_gazebo-classic/src/gazebo_mavlink_interface.cpp:27 #11 0x00007fffc423e390 in gazebo::RegisterPlugin() () at /home/lw/PX4-Autopilot/Tools/simulation/gazebo-classic/sitl_gazebo-classic/src/gazebo_mavlink_interface.cpp:24 #12 0x00007ffff6d5a183 in () at /lib/x86_64-linux-gnu/libgazebo_physics.so.11 #13 0x00007ffff6d55d55 in gazebo::physics::Model::LoadPlugin(std::shared_ptr<sdf::v9::Element>) () at /lib/x86_64-linux-gnu/libgazebo_physics.so.11 #14 0x00007ffff6d56210 in gazebo::physics::Model::LoadPlugins(unsigned int) () at /lib/x86_64-linux-gnu/libgazebo_physics.so.11 #15 0x00007ffff6da3b64 in gazebo::physics::World::ProcessFactoryMsgs() () at /lib/x86_64-linux-gnu/libgazebo_physics.so.11 #16 0x00007ffff6db0da8 in gazebo::physics::World::ProcessMessages() () at /lib/x86_64-linux-gnu/libgazebo_physics.so.11 #17 0x00007ffff6db1527 in gazebo::physics::World::Step() () at /lib/x86_64-linux-gnu/libgazebo_physics.so.11 #18 0x00007ffff6db47fd in gazebo::physics::World::RunLoop() () at /lib/x86_64-linux-gnu/libgazebo_physics.so.11 #19 0x00007ffff7625df4 in () at /lib/x86_64-linux-gnu/libstdc++.so.6 #20 0x00007ffff6f23609 in start_thread (arg=<optimized out>) at pthread_create.c:477 #21 0x00007ffff745f353 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

julianoes commented 3 months ago

Interesting. It's crashing related to the serial port but you're using SITL, so it shouldn't use the serial port. Are you trying HITL? Does it happen with a fresh clone?

lipantao commented 3 months ago

I'm not trying HITL, and my computer isn't connected to any peripherals besides the keyboard and mouse. I've verified that my PX4 code is up to date. Additionally, I've listed the currently utilized serial ports by sudo lsof | grep /dev/tty, and it seems that none of them are occupied except for some system processes. image

lipantao commented 3 months ago

And I've noticed a significant improvement in success rate when using some other models for simulation, such as make px4_sitl gazebo-classic_typhoon_h480. On average, out of ten attempts, it succeeds two to three times, whereas with the default model's command, it might take dozens of attempts to succeed once.

julianoes commented 3 months ago

I'm sorry I'm out of ideas.

lipantao commented 3 months ago

That's OK. Thank you very much!

lipantao commented 3 months ago

I uninstalled Gazebo 11, which was installed using sudo apt-get install (by ubuntu.sh ), and then reinstalled Gazebo 11 from the source code following the official instructions https://classic.gazebosim.org/tutorials?tut=install_from_source&cat=install . Now, the SITL succefully runs smoothly and stably!

julianoes commented 3 months ago

Ah nice. So you actually built it from source? Or installed?

lipantao commented 3 months ago

Yes, I built it from the latest source code and then installed by sudo make install. It's also the latest Gazebo 11.14.0 version.

julianoes commented 3 months ago

Wow, that's commitment! Still puzzled why that happened.

lipantao commented 3 months ago

I'm also confused about the difference that happened between these two installation methods.

mengchaoheng commented 3 months ago

@julianoes The same problem on macOS14.3, but I can run all sitl before I update my os version from 14.2 to 14.3. Maybe the error come from gazebo,since I can run jmavsim successfully. Details on https://github.com/PX4/PX4-Autopilot/issues/22826

mengchaoheng commented 3 months ago

Maybe https://github.com/gazebosim/gazebo-classic/pull/3380 ?

mengchaoheng commented 3 months ago

@julianoes The same problem on macOS14.3, but I can run all sitl before I update my os version from 14.2 to 14.3. Maybe the error come from gazebo,since I can run jmavsim successfully. Details on #22826

I have fix! https://github.com/PX4/PX4-Autopilot/issues/22826#issuecomment-2067587623