Open AlejoAsd opened 4 years ago
I'm also seeing a similar segfault when attempting to open a connection at all, without even quickly cycling the connection:
$ gz launch --versions
6.0.0
/usr/share/gz/gz-launch6/configs$ gz launch websocket.gzlaunch -v 4
[Dbg] [Manager.cc:1164] Loading plugin. Name[gz::launch::WebsocketServer] File[gz-launch-websocket-server]
[Dbg] [WebsocketServer.cc:414] Using port[9002]
[Dbg] [WebsocketServer.cc:429] Using maximum connection count of -1
[Wrn] [WebsocketServer.cc:559] Partial SSL configuration specified. Please specify: <ssl>
<cert_file>PATH_TO_CERT_FILE</cert_file>
<private_key_file>PATH_TO_KEY_FILE</private_key_file>
</ssl>.
Continuing without SSL.
[Dbg] [WebsocketServer.cc:246] LWS_CALLBACK_ESTABLISHED
[Dbg] [WebsocketServer.cc:301] LWS_CALLBACK_RECEIVE
[Dbg] [WebsocketServer.cc:729] Protos request received
[Dbg] [WebsocketServer.cc:301] LWS_CALLBACK_RECEIVE
[Dbg] [WebsocketServer.cc:784] Topic and message type list request received
[Dbg] [WebsocketServer.cc:301] LWS_CALLBACK_RECEIVE
[Dbg] [WebsocketServer.cc:814] World info request received
[Dbg] [WebsocketServer.cc:301] LWS_CALLBACK_RECEIVE
Stack trace (most recent call last) in thread 50526:
#16 Object "", at 0xffffffffffffffff, in
#15 Source "./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S", line 81, in __clone3 [0x7fc91c3269ff]
#14 Source "./nptl/pthread_create.c", line 442, in start_thread [0x7fc91c294b42]
#13 Object "/lib/x86_64-linux-gnu/libstdc++.so.6", at 0x7fc91c6dc2b2, in std::error_code::default_error_condition() const
#12 Object "/usr/lib/x86_64-linux-gnu/gz-launch-6/plugins/libgz-launch-websocket-server.so", at 0x7fc91c9def4c, in gz::launch::WebsocketServer::Run()
#11 Object "/lib/x86_64-linux-gnu/libwebsockets.so.16", at 0x7fc91b8cbfb6, in lws_service
#10 Object "/lib/x86_64-linux-gnu/libwebsockets.so.16", at 0x7fc91b8ec979, in _lws_plat_file_open
#9 Object "/lib/x86_64-linux-gnu/libwebsockets.so.16", at 0x7fc91b8ec6ea, in _lws_plat_file_open
#8 Object "/lib/x86_64-linux-gnu/libwebsockets.so.16", at 0x7fc91b8c9808, in lws_service_fd_tsi
#7 Object "/lib/x86_64-linux-gnu/libwebsockets.so.16", at 0x7fc91b8d66bd, in lws_hdr_custom_copy
#6 Object "/lib/x86_64-linux-gnu/libwebsockets.so.16", at 0x7fc91b8d5b3c, in lws_hdr_custom_copy
#5 Object "/usr/lib/x86_64-linux-gnu/gz-launch-6/plugins/libgz-launch-websocket-server.so", at 0x7fc91c9ea44a, in rootCallback(lws*, lws_callback_reasons, void*, void*, unsigned long)
#4 Object "/usr/lib/x86_64-linux-gnu/gz-launch-6/plugins/libgz-launch-websocket-server.so", at 0x7fc91c9e763d, in gz::launch::WebsocketServer::OnMessage(int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)
#3 Object "/lib/x86_64-linux-gnu/libstdc++.so.6", at 0x7fc91c73cb34, in std::basic_ostream<char, std::char_traits<char> >& std::__ostream_insert<char, std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&, char const*, long)
#2 Object "/lib/x86_64-linux-gnu/libgz-common5.so.5", at 0x7fc91c970955, in gz::common::Logger::Buffer::xsputn(char const*, long)
#1 Object "/lib/x86_64-linux-gnu/libstdc++.so.6", at 0x7fc91c74a72d, in std::basic_streambuf<char, std::char_traits<char> >::xsputn(char const*, long)
#0 Source "./string/../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S", line 317, in __memcpy_avx_unaligned_erms [0x7fc91c3a094d]
Segmentation fault (Address not mapped to object [(nil)])
Segmentation fault (core dumped)
This is simply when using the visualization app hosted from the gazebosim site:
Core dump with crash file:
_usr_lib_x86_64-linux-gnu_gz_launch6_gz-launch.1000.zip
System info:
Found something interesting regarding this bug. I am able to reproduce it using docker with this Dockerfile:
ARG ROS_VERSION=humble
FROM ros:$ROS_VERSION
RUN apt-get update && apt-get install -y --no-install-recommends wget curl
ARG GAZEBO_VERSION=garden
RUN wget https://packages.osrfoundation.org/gazebo.gpg -O /usr/share/keyrings/pkgs-osrf-archive-keyring.gpg && \
echo "deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/pkgs-osrf-archive-keyring.gpg] http://packages.osrfoundation.org/gazebo/ubuntu-stable $(lsb_release -cs) main" | tee /etc/apt/sources.list.d/gazebo-stable.list > /dev/null && \
apt-get update && \
apt-get install -y --no-install-recommends gz-$GAZEBO_VERSION ros-$ROS_DISTRO-ros-gz$GAZEBO_VERSION
RUN curl -O https://raw.githubusercontent.com/gazebosim/gz-launch/main/examples/websocket.gzlaunch
CMD bash -c "gz sim -s -v 4 shapes.sdf & gz launch -v 4 websocket.gzlaunch"
And running this command:
docker build -t gz_launch_bug . && docker run -it --network host gz_launch_bug
However, if I run with
docker build -t gz_launch_bug . && docker run -it -p9002:9002 gz_launch_bug
(ie, I expose the port instead of using --network host
) I can connect just fine from the gazebo sim website visualizer.
I'm curious if @ruffsl was also using Docker / network host when he encountered the bug.
This is not a root cause of course, but could point someone more knowledgeable in the right direction.
I'm curious if @ruffsl was also using Docker / network host when he encountered the bug.
@usedhondacivic , I think I probably was using the host
network interface, as I was mainly using dev containers for experimentation & semi isolation for this project:
Perhaps this as something to do with unusual differences in process namespace isolation in containers vs matching host names with host network interfaces throwing off ZeroMQ, similarly to what I've experienced with DDS and shared memory transport?
The Websocket server segfaults if a websocket connection is opened and closed in quick succession.
Steps to reproduce
Stack trace