Kurento / bugtracker

[ARCHIVED] Contents migrated to monorepo: https://github.com/Kurento/kurento
46 stars 10 forks source link

libnice crashes in socket code: g_socket_send_message (socket=0x0) #247

Closed j1elo closed 5 years ago

j1elo commented 6 years ago

Mail thread: https://groups.google.com/d/topic/kurento/_rf1ANq5Cm8/discussion Related: https://github.com/Kurento/bugtracker/issues/208

Analysis It seems that the crash is happening in the 3rd-party library libnice, during socket access:

#5  0x00007f87c1272044 in g_socket_send_message (socket=0x0, address=address@entry=0x0, vectors=0x7f875c3f9840, num_vectors=2, messages=messages@entry=0x0, num_messages=num_messages@entry=0, flags=0, cancellable=0x0, error=0x7f86dc6951b0) at /build/glib2.0-prJhLS/glib2.0-2.48.2/./gio/gsocket.c:4255
#6  0x00007f87bedb8cbf in socket_send_message (sock=sock@entry=0x7f87940596e0, message=message@entry=0x7f86dc6952a0, reliable=reliable@entry=0) at tcp-bsd.c:306
#7  0x00007f87bedb8f3b in socket_send_messages (sock=0x7f87940596e0, to=<optimized out>, messages=<optimized out>, n_messages=1) at tcp-bsd.c:360
#8  0x00007f87beda0ae9 in nice_agent_send_messages_nonblocking_internal (agent=0x7f875c104650 [NiceAgent], stream_id=<optimized out>, component_id=<optimized out>, messages=0x7f875c3bfdb0, n_messages=5, allow_partial=0, error=0x0) at agent.c:4748

Crash occurs because of g_socket_send_message (socket=0x0). For some reason, g_socket_send_message is called with NULL socket, so it breaks in line 4255 of glib/gsocket.c.

Seems that libnice is famous for having random crashes in socket code. I could find more information about similar issues here and here; in the second link, they mention that some of the crashes were fixed in 2016 with this commit, so I assume that the fix is included in the libnice release version 0.1.14, on Apr 3, 2017.

Kurento uses libnice 0.1.13, so it suffers the bug.

Solution Proposed solution is to update the version of libnice, however this is a big update, and we tried it already in the past but several integration tests broke. Currently, updating libnice to 0.1.14 will break some use cases due to the changes it brings. So updating this library requires careful planning and lots of tests.

Update 2018-08-22 This is a bug in libnice, reported 9 months ago and currently tracked here: libnice issue #33 segfault in g_socket_send_message

We still don't have enough information about the cause of the issue, and have been unable to reproduce it in debugging sessions (as it seems to be a crash that only happens with some amount of load on production servers). As it is, the best option is to show interest in solving that bug in the upstream project's bug tracker. They have all the context and the know-how about libnice's code base and will probably be able to solve the issue.

zhangalex commented 6 years ago

Could you please tell me when the bug will be fixed? thank you.

bugwheels94 commented 6 years ago

It is getting very problematic and kurento is not usable anymore in any production environment. We are losing people. May you please give me a timeline about this issue? Thanks

micaelgallego commented 6 years ago

Hi Ankit,

As you now we are an small team and we can spend time in a limited number of issues.

Right now we are focused on some features asked for some clients. When we finish working on that, maybe we can take care of libnice lib update.

Regards.

Micael Gallego ElasTest Project Technical Coordinator - http://elastest.io Profesor en la Escuela Técnica Superior de Ingeniería Informática URJC

On Wed, Apr 25, 2018 at 7:34 PM, Ankit notifications@github.com wrote:

It is getting very problematic and kurento is not usable anymore in any production environment. We are losing people. May you please give me a timeline about this issue? Thanks

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/Kurento/bugtracker/issues/247#issuecomment-384370654, or mute the thread https://github.com/notifications/unsubscribe-auth/ABBdKDEWkzg3LPZL3KejgsKtwKVqEk88ks5tsLPBgaJpZM4TSCSF .

Kukunin commented 6 years ago

Affected by this issue as well. That's sad since we also can't use Kurento in production because it fails every a couple of hours =( Any possible workarounds? (cherry picking, custom builds, etc?)

micaelgallego commented 6 years ago

Hi Sergey,

We are going to put some effort on this issue in the following days. We are going to update libnice lib to the last version to see if it solves the problem.

Please stay tuned

Kukunin commented 6 years ago

Please, can you provide a little update on this issue? We can't go live without having this fixed

JerryGwd commented 6 years ago

I faced the same problem, as follow: egmentation fault (thread 139909583050496, pid 8231) Stack trace: [g_socket_send_message] /usr/lib/x86_64-linux-gnu/libgio-2.0.so.0:0x7B044 [nice_output_stream_new] /usr/lib/x86_64-linux-gnu/libnice.so.10:0x2ACBF [nice_output_stream_new] /usr/lib/x86_64-linux-gnu/libnice.so.10:0x2AF3B [nice_agent_recv_nonblocking] /usr/lib/x86_64-linux-gnu/libnice.so.10:0x12AE9 [gst_nice_src_get_type] /usr/lib/x86_64-linux-gnu/gstreamer-1.5/libgstnice15.so:0x36B2 [gst_nice_sink_get_type] /usr/lib/x86_64-linux-gnu/gstreamer-1.5/libgstnice15.so:0x3FB3 [gst_base_sink_do_preroll] /usr/lib/x86_64-linux-gnu/libgstbase-1.5.so.0:0x2A1B2 [gst_base_sink_do_preroll] /usr/lib/x86_64-linux-gnu/libgstbase-1.5.so.0:0x2B620 [gst_flow_get_name] /usr/lib/x86_64-linux-gnu/libgstreamer-1.5.so.0:0x6E5CF [gst_pad_push] /usr/lib/x86_64-linux-gnu/libgstreamer-1.5.so.0:0x76533 [gst_proxy_pad_chain_default] /usr/lib/x86_64-linux-gnu/libgstreamer-1.5.so.0:0x5F5E3 [gst_flow_get_name] /usr/lib/x86_64-linux-gnu/libgstreamer-1.5.so.0:0x6E5CF [gst_pad_push] /usr/lib/x86_64-linux-gnu/libgstreamer-1.5.so.0:0x76533 0x1B48D at /usr/lib/x86_64-linux-gnu/gstreamer-1.5/libgstcoreelements.so [gst_flow_get_name] /usr/lib/x86_64-linux-gnu/libgstreamer-1.5.so.0:0x6E5CF [gst_pad_push] /usr/lib/x86_64-linux-gnu/libgstreamer-1.5.so.0:0x76533

j1elo commented 6 years ago

Hi,

We've been working on this issue, and found out that the latest development branch of libnice seems to work better. In our tests, either versions 0.1.13 and 0.1.14 crashed, but the development branch for 0.1.15 (or as libnice creators like to put it, "0.1.14.1") is currently working pretty well.

It would help a lot if we could confirm with more people if this is only due to our specific test environment, or if this latest version of libnice does actually solve problems that are being encountered by KMS users.

I've prepared Debian package files from upstream libnice commit 090d3dba, the latest as of this week. If you have a staging server where you could try out this version, it would help us all to know whether this version is a good candidate to be included in the upcoming release of Kurento 6.8.

Installation steps for a clean Ubuntu 16.04 (Xenial) server:

  1. Download the experimental packages.

For Kurento Pre-Release:

wget -O test.zip https://www.dropbox.com/sh/525depzmhj2vt47/AADcgc4o_QwjcZpaBlQWqgspa?dl=1

For Kurento Release:

No build available.
  1. Install the media server:
sudo apt-get update
sudo apt-get install kurento-media-server
  1. Install the debug symbols. These will provide needed information in case of a crash. Run the apt-get steps to install all -dbg symbols, as explained in Media Server crashed.

  2. Install the experimental version of the provided package(s):

unzip test.zip
sudo dpkg -i ./*.*deb
  1. If any errors happen, force installing any missing dependencies and try again:
sudo apt-get install -f
sudo dpkg -i ./*.*deb
  1. Test again your use case and let us know if it fails again with the same issue.
Richard-Aasa commented 6 years ago

(kurento-media-server:13770): libnice-DEBUG: Agent 0x7fd87404ca40 : STUN-CC RESP to '213.184.55.243:57201', socket=47, len=80, cand=0x7fd8640b4070 (c-id:1), use-cand=1. (kurento-media-server:13770): libnice-DEBUG: Agent 0x7fd87404ca40 : Found a matching pair 0x7fd864100820 (6:remote1) (SUCCEEDED) ... (kurento-media-server:13770): libnice-DEBUG: Agent 0x7fd87404ca40 : nothing to do for pair 0x7fd864100820. (kurento-media-server:13770): libnice-DEBUG: Agent 0x7fd87404ca40: Finding highest priority for component 1 (kurento-media-server:13770): libnice-DEBUG: Agent 0x7fd87404ca40: Pruning pending checks. Highest nominated priority is 4341472238214462975 (kurento-media-server:13770): libnice-DEBUG: Agent 0x7fd87404ca40 : conn.check list status: 3 nominated, 3 valid, c-id 1. (kurento-media-server:13770): libnice-DEBUG: Agent 0x7fd87404ca40 : marking pair 0x7fd864100820 (6:remote1) as nominated (kurento-media-server:13770): libnice-DEBUG: Agent 0x7fd87404ca40: Finding highest priority for component 1 (kurento-media-server:13770): libnice-DEBUG: Agent 0x7fd87404ca40: Pruning pending checks. Highest nominated priority is 4341472238214462975 (kurento-media-server:13770): libnice-DEBUG: Agent 0x7fd87404ca40 : conn.check list status: 3 nominated, 3 valid, c-id 1. (kurento-media-server:13770): libnice-DEBUG: agent_recv_message_unlocked: Valid STUN packet received. (kurento-media-server:13770): libnice-stun-DEBUG: STUN error: Incomplete message: 61 of 65300 bytes! (kurento-media-server:13770): libnice-DEBUG: Agent 0x7fd88407ba40: agent_recv_message_unlocked returned -1, errno (11) : Resource temporarily unavailable (kurento-media-server:13770): libnice-DEBUG: component_io_cb: 0x7fd88407ba40: error receiving message (kurento-media-server:13770): libnice-DEBUG: Agent 0x7fd88407ba40 : Retransmissions failed, giving up on pair 0xff2000 (kurento-media-server:13770): libnice-DEBUG: Agent 0x7fd88407ba40 : pair 0xff2000 state FAILED (candidate_check_pair_fail) (kurento-media-server:13770): libnice-DEBUG: Detach socket 0x7fd898050b10. (kurento-media-server:13770): libnice-DEBUG: Detaching source 0xec4550 (socket 0x7fd898050b10, FD 138) from context 0x7fd89409bc80 (kurento-media-server:13770): libnice-DEBUG: Detaching source (nil) (socket 0x7fd898050b10, FD 138) from context (nil)

[1]+ Stopped /usr/bin/kurento-media-server $ Segmentation fault (thread 140566658152192, pid 13770) Stack trace: [g_socket_send_message] /usr/lib/x86_64-linux-gnu/libgio-2.0.so.0:0x7B044 [socket_send_message] /opt/libnice/socket/tcp-bsd.c:309 [socket_send_messages] /opt/libnice/socket/tcp-bsd.c:362 [nice_agent_send_messages_nonblocking_internal] /opt/libnice/agent/agent.c:4833 [gst_nice_src_get_type] /usr/lib/x86_64-linux-gnu/gstreamer-1.5/libgstnice15.so:0x36A2 [gst_nice_sink_get_type] /usr/lib/x86_64-linux-gnu/gstreamer-1.5/libgstnice15.so:0x3FA3 [gst_base_sink_do_preroll] /usr/lib/x86_64-linux-gnu/libgstbase-1.5.so.0:0x2A1B2 [gst_base_sink_do_preroll] /usr/lib/x86_64-linux-gnu/libgstbase-1.5.so.0:0x2B620 [gst_flow_get_name] /usr/lib/x86_64-linux-gnu/libgstreamer-1.5.so.0:0x6E5CF [gst_pad_push] /usr/lib/x86_64-linux-gnu/libgstreamer-1.5.so.0:0x76533 [gst_proxy_pad_chain_default] /usr/lib/x86_64-linux-gnu/libgstreamer-1.5.so.0:0x5F5E3 [gst_flow_get_name] /usr/lib/x86_64-linux-gnu/libgstreamer-1.5.so.0:0x6E5CF [gst_pad_push] /usr/lib/x86_64-linux-gnu/libgstreamer-1.5.so.0:0x76533 0x1B48D at /usr/lib/x86_64-linux-gnu/gstreamer-1.5/libgstcoreelements.so [gst_flow_get_name] /usr/lib/x86_64-linux-gnu/libgstreamer-1.5.so.0:0x6E5CF [gst_pad_push] /usr/lib/x86_64-linux-gnu/libgstreamer-1.5.so.0:0x76533

Hope this helps. Seems something to do with partial messages or nulled sockets.

j1elo commented 6 years ago

Hi, thanks for testing! I've updated the corresponding bug report in libnice's bugtracker: https://gitlab.freedesktop.org/libnice/libnice/issues/33#note_19329

But, we'll need more information about this crash, as that's the first thing they will need to fix this. Do you have any additional information, especially about how to reproduce the crash?

Richard-Aasa commented 6 years ago

Hey!

Appreciate you guys working on this! Sadly we have nothing conclusive and this is a pain to test. We just had multiple different devices/browsers/OS's and streamed the video until the video froze. There's some speculation that it occurs more often on iOS. In our initial testing we did not see this occur at all.

We have one client streaming to the server(pub) and N clients(sub) streaming from the server. I'll try ask for more environment specifics tomorrow.

If we get any errors on clientside or more info, will update. If you have any ideas on how to narrow down on what could cause this, we can help verify.

Best of luck!

On Tue, Aug 14, 2018, 18:28 Juan Navarro notifications@github.com wrote:

Hi, thanks for testing! I've updated the corresponding bug report in libnice's bugtracker: https://gitlab.freedesktop.org/libnice/libnice/issues/33#note_19329

But, we'll need more information about this crash, as that's the first thing they will need to fix this. Do you have any additional information, especially about how to reproduce the crash?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/Kurento/bugtracker/issues/247#issuecomment-412912927, or mute the thread https://github.com/notifications/unsubscribe-auth/ADIup4cNfWIxs1OTsPpAYcgRpyQrXZnhks5uQuyzgaJpZM4TSCSF .

Richard-Aasa commented 6 years ago

We had some miscommunication and turns out the error was not produced on patched libnice. Here's the stacktrace that actually happened on the provided patch.

Kurento Media Server version: 6.7.1 
Found modules: 
       'core' version 6.7.1 
       'elements' version 6.7.1 
       'filters' version 6.7.1

ii  gir1.2-nice-0.1:amd64                  0.1.15-1ubuntu1~20180808133501.gbpae8742   amd64        ICE library (GObject introspection) 
ii  gstreamer1.5-nice:amd64                0.1.15-1ubuntu1~20180808133501.gbpae8742   amd64        ICE library (GStreamer 1.5 plugin) 
ii  libnice10:amd64                        0.1.15-1ubuntu1~20180808133501.gbpae8742   amd64        ICE library (shared library)
(kurento-media-server:4362): libnice-DEBUG: Agent 0xa7d650 : STUN-CC RESP to '213.184.55.243:60754', socket=34, len=80, cand=0x7ff904005ef0 (c-id:1), use-cand=1. 
(kurento-media-server:4362): libnice-DEBUG: Agent 0xa7d650 : Found a matching pair 0x7ff91c021590 (6:remote1) (SUCCEEDED) ... 
(kurento-media-server:4362): libnice-DEBUG: Agent 0xa7d650 : nothing to do for pair 0x7ff91c021590. 
(kurento-media-server:4362): libnice-DEBUG: Agent 0xa7d650: Finding highest priority for component 1 
(kurento-media-server:4362): libnice-DEBUG: Agent 0xa7d650: Pruning pending checks. Highest nominated priority is 4341472238214462975 
(kurento-media-server:4362): libnice-DEBUG: Agent 0xa7d650 : conn.check list status: 1 nominated, 1 valid, c-id 1. 
(kurento-media-server:4362): libnice-DEBUG: Agent 0xa7d650 : marking pair 0x7ff91c021590 (6:remote1) as nominated 
(kurento-media-server:4362): libnice-DEBUG: Agent 0xa7d650: Finding highest priority for component 1 
(kurento-media-server:4362): libnice-DEBUG: Agent 0xa7d650: Pruning pending checks. Highest nominated priority is 4341472238214462975 
(kurento-media-server:4362): libnice-DEBUG: Agent 0xa7d650 : conn.check list status: 1 nominated, 1 valid, c-id 1. 
(kurento-media-server:4362): libnice-DEBUG: agent_recv_message_unlocked: Valid STUN packet received. 
(kurento-media-server:4362): libnice-DEBUG: Agent 0x7ff914062780: inbound STUN packet for 1/1 (stream/component) from [213.184.55.243]:60803 (100 octets) : 
(kurento-media-server:4362): libnice-stun-DEBUG: STUN demux: OK! 
(kurento-media-server:4362): libnice-stun-DEBUG: Comparing username/ufrag of len 9 and 4, equal=0 
(kurento-media-server:4362): libnice-stun-DEBUG:   username: 0x665551463a75782f72 
(kurento-media-server:4362): libnice-stun-DEBUG:   ufrag:    0x66555146 
(kurento-media-server:4362): libnice-stun-DEBUG: Found valid username, returning password: 'NAypFUPXKnVufpeu1zq7Wj' 
(kurento-media-server:4362): libnice-stun-DEBUG:  Message HMAC-SHA1 fingerprint: 
(kurento-media-server:4362): libnice-stun-DEBUG:   key     : 0x4e417970465550584b6e567566706575317a7137576a 
(kurento-media-server:4362): libnice-stun-DEBUG:   expected: 0xe60e79247caf1e6f0ce35f17ce2bf2c1558ea912 
(kurento-media-server:4362): libnice-stun-DEBUG:   received: 0xe60e79247caf1e6f0ce35f17ce2bf2c1558ea912 
(kurento-media-server:4362): libnice-stun-DEBUG: STUN auth: OK! 
(kurento-media-server:4362): libnice-stun-DEBUG: STUN unknown: 0 mandatory attribute(s)! 
(kurento-media-server:4362): libnice-stun-DEBUG: STUN Reply (buffer size = 1300)... 
(kurento-media-server:4362): libnice-stun-DEBUG:  Message HMAC-SHA1 message integrity: 
(kurento-media-server:4362): libnice-stun-DEBUG:   key     : 0x4e417970465550584b6e567566706575317a7137576a 
(kurento-media-server:4362): libnice-stun-DEBUG:   sent    : 0x3bfebf4eff653d3136bcc700ac33b980fd527fa3 
(kurento-media-server:4362): libnice-stun-DEBUG:  Message HMAC-SHA1 fingerprint: 0x29668cba 
(kurento-media-server:4362): libnice-stun-DEBUG:  All done (response size: 80) 
(kurento-media-server:4362): libnice-DEBUG: Agent 0x7ff914062780 : STUN-CC RESP to '213.184.55.243:60803', socket=60, len=80, cand=0x7ff92c0734e0 (c-id:1), use-cand=1. 
(kurento-media-server:4362): libnice-DEBUG: Agent 0x7ff914062780 : Found a matching pair 0x7ff92c046960 (6:remote1) (SUCCEEDED) ... 
(kurento-media-server:4362): libnice-DEBUG: Agent 0x7ff914062780 : nothing to do for pair 0x7ff92c046960. 
(kurento-media-server:4362): libnice-DEBUG: Agent 0x7ff914062780: Finding highest priority for component 1 
(kurento-media-server:4362): libnice-DEBUG: Agent 0x7ff914062780: Pruning pending checks. Highest nominated priority is 4341472238197816831 
(kurento-media-server:4362): libnice-DEBUG: Agent 0x7ff914062780 : conn.check list status: 2 nominated, 2 valid, c-id 1. 
(kurento-media-server:4362): libnice-DEBUG: Agent 0x7ff914062780 : marking pair 0x7ff92c046960 (6:remote1) as nominated 
(kurento-media-server:4362): libnice-DEBUG: Agent 0x7ff914062780: Finding highest priority for component 1 
(kurento-media-server:4362): libnice-DEBUG: Agent 0x7ff914062780: Pruning pending checks. Highest nominated priority is 4341472238197816831 
(kurento-media-server:4362): libnice-DEBUG: Agent 0x7ff914062780 : conn.check list status: 2 nominated, 2 valid, c-id 1. 
(kurento-media-server:4362): libnice-DEBUG: agent_recv_message_unlocked: Valid STUN packet received. 
(kurento-media-server:4362): libnice-DEBUG: Agent 0x7ff93c1b8e20 :STUN transaction retransmitted on pair 0x7ff928009ac0 (timer=6/7 2/3840ms). 
(kurento-media-server:4362): libnice-DEBUG: Agent 0x7ff93c1b8e20 : discovery tick #4101 with list 0x7ff910001d50 (1) 
(kurento-media-server:4362): libnice-stun-DEBUG: STUN error: Incomplete message: 61 of 65300 bytes! 
(kurento-media-server:4362): libnice-DEBUG: Agent 0x7ff93c1b8e20 : stream 1: timer tick #201: 0 frozen, 4 in-progress, 0 waiting, 1 succeeded, 0 discovered, 1 nominated, 0 waiting-for-nom, 1 valid. 
(kurento-media-server:4362): libnice-DEBUG: Agent 0x7ff93c1b8e20: agent_recv_message_unlocked returned -1, errno (11) : Resource temporarily unavailable 
(kurento-media-server:4362): libnice-DEBUG: component_io_cb: 0x7ff93c1b8e20: error receiving message 
(kurento-media-server:4362): libnice-DEBUG: Agent 0x7ff93c1b8e20 : Retransmissions failed, giving up on pair 0x7ff908022b90 
(kurento-media-server:4362): libnice-DEBUG: Agent 0x7ff93c1b8e20 : pair 0x7ff908022b90 state FAILED (candidate_check_pair_fail) 
(kurento-media-server:4362): libnice-DEBUG: Detach socket 0x7ff9280c80a0. 
(kurento-media-server:4362): libnice-DEBUG: Detaching source 0x7ff908010f80 (socket 0x7ff9280c80a0, FD 76) from context 0x7ff9380e4f20 
(kurento-media-server:4362): libnice-DEBUG: Detaching source (nil) (socket 0x7ff9280c80a0, FD 76) from context (nil) 

[1]+  Stopped                 /usr/bin/kurento-media-server 
$ Segmentation fault (thread 140707155990272, pid 4362) 
Stack trace: 
[g_socket_send_message] 
/usr/lib/x86_64-linux-gnu/libgio-2.0.so.0:0x7B044 
[nice_output_stream_new] 
/usr/lib/x86_64-linux-gnu/libnice.so.10:0x2B04F 
[nice_output_stream_new] 
/usr/lib/x86_64-linux-gnu/libnice.so.10:0x2B2CB 
[nice_agent_recv_nonblocking] 
/usr/lib/x86_64-linux-gnu/libnice.so.10:0x12CF9 
[gst_nice_src_get_type] 
/usr/lib/x86_64-linux-gnu/gstreamer-1.5/libgstnice15.so:0x36A2 
[gst_nice_sink_get_type] 
/usr/lib/x86_64-linux-gnu/gstreamer-1.5/libgstnice15.so:0x3FA3 
[gst_base_sink_do_preroll] 
/usr/lib/x86_64-linux-gnu/libgstbase-1.5.so.0:0x2A1B2 
[gst_base_sink_do_preroll] 
/usr/lib/x86_64-linux-gnu/libgstbase-1.5.so.0:0x2B620 
[gst_flow_get_name] 
/usr/lib/x86_64-linux-gnu/libgstreamer-1.5.so.0:0x6E5CF 
[gst_pad_push] 
/usr/lib/x86_64-linux-gnu/libgstreamer-1.5.so.0:0x76533 
[gst_proxy_pad_chain_default] 
/usr/lib/x86_64-linux-gnu/libgstreamer-1.5.so.0:0x5F5E3 
[gst_flow_get_name] 
/usr/lib/x86_64-linux-gnu/libgstreamer-1.5.so.0:0x6E5CF 
[gst_pad_push] 
/usr/lib/x86_64-linux-gnu/libgstreamer-1.5.so.0:0x76533 
0x1B48D at /usr/lib/x86_64-linux-gnu/gstreamer-1.5/libgstcoreelements.so 
[gst_flow_get_name] 
/usr/lib/x86_64-linux-gnu/libgstreamer-1.5.so.0:0x6E5CF 
[gst_pad_push] 
/usr/lib/x86_64-linux-gnu/libgstreamer-1.5.so.0:0x76533
j1elo commented 6 years ago

Hi, that stack trace doesn't contain any file names and line numbers; please make sure you follow the step 3 from the install instructions; also in step 4, make sure you install packages gstreamer1.5-nice-dbgsym and libnice10-dbgsym, which contain debug symbols of libnice itself.

Kukunin commented 6 years ago

Here is my crash report on the updated libnice:

[1]+ Stopped /usr/bin/kurento-media-server
root@ns3113927:/home# [g_socket_send_message]
/build/glib2.0-b4FPyK/glib2.0-2.48.2/./gio/gsocket.c:4255
[socket_send_message]
/workspace/socket/tcp-bsd.c:309
[socket_send_messages]
/workspace/socket/tcp-bsd.c:362
[nice_agent_send_messages_nonblocking_internal]
/workspace/agent/agent.c:4833
[gst_nice_sink_render_buffers]
/workspace/gst/gstnicesink.c:301
[gst_nice_sink_render]
/workspace/gst/gstnicesink.c:362
[gst_base_sink_chain_unlocked]
/opt/kurento/libs/gst/base/gstbasesink.c:3544
[gst_base_sink_chain_main]
/opt/kurento/libs/gst/base/gstbasesink.c:3656
[gst_pad_chain_data_unchecked]
/opt/kurento/gst/gstpad.c:4185
[gst_pad_push]
/opt/kurento/gst/gstpad.c:4556
[gst_proxy_pad_chain_default]
/opt/kurento/gst/gstghostpad.c:127
[gst_pad_chain_data_unchecked]
/opt/kurento/gst/gstpad.c:4185
[gst_pad_push]
/opt/kurento/gst/gstpad.c:4556
[gst_funnel_sink_chain_object]
/opt/kurento/plugins/elements/gstfunnel.c:454
[gst_pad_chain_data_unchecked]
/opt/kurento/gst/gstpad.c:4185
[gst_pad_push]
/opt/kurento/gst/gstpad.c:4556
Kukunin commented 6 years ago

I have the following packages: dpkg -l | grep libnice:

ii libnice10:amd64 0.1.15-1ubuntu1~20180808133501.gbpae8742 amd64 ICE library (shared library)
ii libnice10-dbgsym:amd64 0.1.15-1ubuntu1~20180808133501.gbpae8742 amd64 debug symbols for package libnice10

dpkg -l | grep gstreamer

ii gstreamer1.5-alsa:amd64 1.8.1.1.xenial~20170725154709.55.7b19cfd amd64 GStreamer plu gin for ALSA
ii gstreamer1.5-libav:amd64 1.8.2.1.xenial~20170725171352.96.493eee4 amd64 libav plugin for GStreamer
ii gstreamer1.5-libav-dbg:amd64 1.8.2.1.xenial~20170725171352.96.493eee4 amd64 libav plugin for GStreamer (debug symbols)
ii gstreamer1.5-nice:amd64 0.1.15-1ubuntu1~20180808133501.gbpae8742 amd64 ICE library ( GStreamer 1.5 plugin)
ii gstreamer1.5-nice-dbgsym:amd64 0.1.15-1ubuntu1~20180808133501.gbpae8742 amd64 debug symbols for package gstreamer1.5-nice
ii gstreamer1.5-plugins-bad:amd64 1.8.1.1.xenial~20170725164047.100.3db37b1 amd64 GStreamer plu gins from the "bad" set
ii gstreamer1.5-plugins-bad-dbg:amd64 1.8.1.1.xenial~20170725164047.100.3db37b1 amd64 GStreamer plu gins from the "bad" set (debug symbols)
ii gstreamer1.5-plugins-base:amd64 1.8.1.1.xenial~20170725154709.55.7b19cfd amd64 GStreamer plu gins from the "base" set
ii gstreamer1.5-plugins-base-dbg:amd64 1.8.1.1.xenial~20170725154709.55.7b19cfd amd64 GStreamer plu gins from the "base" set
ii gstreamer1.5-plugins-good:amd64 1.8.1.1.xenial~20170725161537.112.9ee4248 amd64 GStreamer plu gins from the "good" set
ii gstreamer1.5-plugins-good-dbg:amd64 1.8.1.1.xenial~20170725161537.112.9ee4248 amd64 GStreamer plu gins from the "good" set
ii gstreamer1.5-plugins-ugly:amd64 1.8.1.1.xenial~20170725170621.89.2685b0f amd64 GStreamer plu gins from the "ugly" set
ii gstreamer1.5-plugins-ugly-dbg:amd64 1.8.1.1.xenial~20170725170621.89.2685b0f amd64 GStreamer plu gins from the "ugly" set (debug symbols)
ii gstreamer1.5-pulseaudio:amd64 1.8.1.1.xenial~20170725161537.112.9ee4248 amd64 GStreamer plu gin for PulseAudio
ii gstreamer1.5-x:amd64 1.8.1.1.xenial~20170725154709.55.7b19cfd amd64 GStreamer plu gins for X11 and Pango
ii libgstreamer-plugins-bad1.5-0:amd64 1.8.1.1.xenial~20170725164047.100.3db37b1 amd64 GStreamer dev elopment files for libraries from the "bad" set
ii libgstreamer-plugins-base1.5-0:amd64 1.8.1.1.xenial~20170725154709.55.7b19cfd amd64 GStreamer lib raries from the "base" set
ii libgstreamer1.5-0:amd64 1.8.1.1.xenial~20170725152356.170.0d6031b amd64 Core GStreame r libraries and elements
ii libgstreamer1.5-0-dbg:amd64 1.8.1.1.xenial~20170725152356.170.0d6031b amd64 Core GStreame r libraries and elements
Richard-Aasa commented 6 years ago

Same message. Bug exists on both the dev and latest. Still can't reproduce it. We tried different latency, bandwith, dropping connections and those don't seem to affect it either. If your patch seemed to fix it, then is this something completely unrelated? @j1elo I know this ain't the place to ask for estimates, but can You give an estimate if this problem is solvable for You in the next 30 days? Currently it's completely unusable.

(kurento-media-server:13770): libnice-stun-DEBUG: STUN demux: OK!
(kurento-media-server:13770): libnice-stun-DEBUG: Comparing username/ufrag of len 9 and 4, equal=0
(kurento-media-server:13770): libnice-stun-DEBUG:   username: 0x56336e5a3a326c5465
(kurento-media-server:13770): libnice-stun-DEBUG:   ufrag:    0x56336e5a
(kurento-media-server:13770): libnice-stun-DEBUG: Found valid username, returning password: 'zusuWV9dq9WWtaE8ZZ9YhB'
(kurento-media-server:13770): libnice-stun-DEBUG:  Message HMAC-SHA1 fingerprint:
(kurento-media-server:13770): libnice-stun-DEBUG:   key     : 0x7a7573755756396471395757746145385a5a39596842
(kurento-media-server:13770): libnice-stun-DEBUG:   expected: 0x49a816ea79cc96ef7dfb6693a20cd221a5ba64a1
(kurento-media-server:13770): libnice-stun-DEBUG:   received: 0x49a816ea79cc96ef7dfb6693a20cd221a5ba64a1
(kurento-media-server:13770): libnice-stun-DEBUG: STUN auth: OK!
(kurento-media-server:13770): libnice-stun-DEBUG: STUN unknown: 0 mandatory attribute(s)!
(kurento-media-server:13770): libnice-stun-DEBUG: STUN Reply (buffer size = 1300)...
(kurento-media-server:13770): libnice-stun-DEBUG:  Message HMAC-SHA1 message integrity:
(kurento-media-server:13770): libnice-stun-DEBUG:   key     : 0x7a7573755756396471395757746145385a5a39596842
(kurento-media-server:13770): libnice-stun-DEBUG:   sent    : 0x662a3e3f135745a9d2bcd29d0b26277d8593c9e7
(kurento-media-server:13770): libnice-stun-DEBUG:  Message HMAC-SHA1 fingerprint: 0x16afbd47
(kurento-media-server:13770): libnice-stun-DEBUG:  All done (response size: 80)
(kurento-media-server:13770): libnice-DEBUG: Agent 0x7fd8580105c0 : STUN-CC RESP to '213.184.55.243:63450', socket=86, len=80, cand=0x7fd860016e00 (c-id:1), use-cand=1.
(kurento-media-server:13770): libnice-DEBUG: Agent 0x7fd8580105c0 : Found a matching pair 0x7fd8641a4f00 (6:remote1) (SUCCEEDED) ...
(kurento-media-server:13770): libnice-DEBUG: Agent 0x7fd8580105c0 : nothing to do for pair 0x7fd8641a4f00.
(kurento-media-server:13770): libnice-DEBUG: Agent 0x7fd8580105c0: Finding highest priority for component 1
(kurento-media-server:13770): libnice-DEBUG: Agent 0x7fd8580105c0: Pruning pending checks. Highest nominated priority is 4341472238197816831
(kurento-media-server:13770): libnice-DEBUG: Agent 0x7fd8580105c0 : conn.check list status: 1 nominated, 1 valid, c-id 1.
(kurento-media-server:13770): libnice-DEBUG: Agent 0x7fd8580105c0 : marking pair 0x7fd8641a4f00 (6:remote1) as nominated
(kurento-media-server:13770): libnice-DEBUG: Agent 0x7fd8580105c0: Finding highest priority for component 1
(kurento-media-server:13770): libnice-DEBUG: Agent 0x7fd8580105c0: Pruning pending checks. Highest nominated priority is 4341472238197816831
(kurento-media-server:13770): libnice-DEBUG: Agent 0x7fd8580105c0 : conn.check list status: 1 nominated, 1 valid, c-id 1.
(kurento-media-server:13770): libnice-DEBUG: agent_recv_message_unlocked: Valid STUN packet received.
(kurento-media-server:13770): libnice-DEBUG: Agent 0x7fd87404ca40: inbound STUN packet for 1/1 (stream/component) from [213.184.55.243]:57201 (100 octets) :
(kurento-media-server:13770): libnice-stun-DEBUG: STUN demux: OK!
(kurento-media-server:13770): libnice-stun-DEBUG: Comparing username/ufrag of len 9 and 4, equal=0
(kurento-media-server:13770): libnice-stun-DEBUG:   username: 0x525047483a796a4a31
(kurento-media-server:13770): libnice-stun-DEBUG:   ufrag:    0x52504748
(kurento-media-server:13770): libnice-stun-DEBUG: Found valid username, returning password: 'qdSqKiT7GWoxP0s8CxyOJD'
(kurento-media-server:13770): libnice-stun-DEBUG:  Message HMAC-SHA1 fingerprint:
(kurento-media-server:13770): libnice-stun-DEBUG:   key     : 0x716453714b69543747576f78503073384378794f4a44
(kurento-media-server:13770): libnice-stun-DEBUG:   expected: 0x5f03b741338b921b6ec381306e21370dd07f994d
(kurento-media-server:13770): libnice-stun-DEBUG:   received: 0x5f03b741338b921b6ec381306e21370dd07f994d
(kurento-media-server:13770): libnice-stun-DEBUG: STUN auth: OK!
(kurento-media-server:13770): libnice-stun-DEBUG: STUN unknown: 0 mandatory attribute(s)!
(kurento-media-server:13770): libnice-stun-DEBUG: STUN Reply (buffer size = 1300)...
(kurento-media-server:13770): libnice-stun-DEBUG:  Message HMAC-SHA1 message integrity:
(kurento-media-server:13770): libnice-stun-DEBUG:   key     : 0x716453714b69543747576f78503073384378794f4a44
(kurento-media-server:13770): libnice-stun-DEBUG:   sent    : 0x4e81cf80dab32530fb2f7bec11277c8089e87e1c
(kurento-media-server:13770): libnice-stun-DEBUG:  Message HMAC-SHA1 fingerprint: 0x4346a4bb
(kurento-media-server:13770): libnice-stun-DEBUG:  All done (response size: 80)
(kurento-media-server:13770): libnice-DEBUG: Agent 0x7fd87404ca40 : STUN-CC RESP to '213.184.55.243:57201', socket=47, len=80, cand=0x7fd8640b4070 (c-id:1), use-cand=1.
(kurento-media-server:13770): libnice-DEBUG: Agent 0x7fd87404ca40 : Found a matching pair 0x7fd864100820 (6:remote1) (SUCCEEDED) ...
(kurento-media-server:13770): libnice-DEBUG: Agent 0x7fd87404ca40 : nothing to do for pair 0x7fd864100820.
(kurento-media-server:13770): libnice-DEBUG: Agent 0x7fd87404ca40: Finding highest priority for component 1
(kurento-media-server:13770): libnice-DEBUG: Agent 0x7fd87404ca40: Pruning pending checks. Highest nominated priority is 4341472238214462975
(kurento-media-server:13770): libnice-DEBUG: Agent 0x7fd87404ca40 : conn.check list status: 3 nominated, 3 valid, c-id 1.
(kurento-media-server:13770): libnice-DEBUG: Agent 0x7fd87404ca40 : marking pair 0x7fd864100820 (6:remote1) as nominated
(kurento-media-server:13770): libnice-DEBUG: Agent 0x7fd87404ca40: Finding highest priority for component 1
(kurento-media-server:13770): libnice-DEBUG: Agent 0x7fd87404ca40: Pruning pending checks. Highest nominated priority is 4341472238214462975
(kurento-media-server:13770): libnice-DEBUG: Agent 0x7fd87404ca40 : conn.check list status: 3 nominated, 3 valid, c-id 1.
(kurento-media-server:13770): libnice-DEBUG: agent_recv_message_unlocked: Valid STUN packet received.
(kurento-media-server:13770): libnice-stun-DEBUG: STUN error: Incomplete message: 61 of 65300 bytes!
(kurento-media-server:13770): libnice-DEBUG: Agent 0x7fd88407ba40: agent_recv_message_unlocked returned -1, errno (11) : Resource temporarily unavailable
(kurento-media-server:13770): libnice-DEBUG: component_io_cb: 0x7fd88407ba40: error receiving message
(kurento-media-server:13770): libnice-DEBUG: Agent 0x7fd88407ba40 : Retransmissions failed, giving up on pair 0xff2000
(kurento-media-server:13770): libnice-DEBUG: Agent 0x7fd88407ba40 : pair 0xff2000 state FAILED (candidate_check_pair_fail)
(kurento-media-server:13770): libnice-DEBUG: Detach socket 0x7fd898050b10.
(kurento-media-server:13770): libnice-DEBUG: Detaching source 0xec4550 (socket 0x7fd898050b10, FD 138) from context 0x7fd89409bc80
(kurento-media-server:13770): libnice-DEBUG: Detaching source (nil) (socket 0x7fd898050b10, FD 138) from context (nil)

[1]+  Stopped                 /usr/bin/kurento-media-server
$ Segmentation fault (thread 140566658152192, pid 13770)
Stack trace:
[g_socket_send_message]
/usr/lib/x86_64-linux-gnu/libgio-2.0.so.0:0x7B044
[socket_send_message]
/opt/libnice/socket/tcp-bsd.c:309
[socket_send_messages]
/opt/libnice/socket/tcp-bsd.c:362
[nice_agent_send_messages_nonblocking_internal]
/opt/libnice/agent/agent.c:4833
[gst_nice_src_get_type]
/usr/lib/x86_64-linux-gnu/gstreamer-1.5/libgstnice15.so:0x36A2
[gst_nice_sink_get_type]
/usr/lib/x86_64-linux-gnu/gstreamer-1.5/libgstnice15.so:0x3FA3
[gst_base_sink_do_preroll]
/usr/lib/x86_64-linux-gnu/libgstbase-1.5.so.0:0x2A1B2
[gst_base_sink_do_preroll]
/usr/lib/x86_64-linux-gnu/libgstbase-1.5.so.0:0x2B620
[gst_flow_get_name]
/usr/lib/x86_64-linux-gnu/libgstreamer-1.5.so.0:0x6E5CF
[gst_pad_push]
/usr/lib/x86_64-linux-gnu/libgstreamer-1.5.so.0:0x76533
[gst_proxy_pad_chain_default]
/usr/lib/x86_64-linux-gnu/libgstreamer-1.5.so.0:0x5F5E3
[gst_flow_get_name]
/usr/lib/x86_64-linux-gnu/libgstreamer-1.5.so.0:0x6E5CF
[gst_pad_push]
/usr/lib/x86_64-linux-gnu/libgstreamer-1.5.so.0:0x76533
0x1B48D at /usr/lib/x86_64-linux-gnu/gstreamer-1.5/libgstcoreelements.so
[gst_flow_get_name]
/usr/lib/x86_64-linux-gnu/libgstreamer-1.5.so.0:0x6E5CF
[gst_pad_push]
/usr/lib/x86_64-linux-gnu/libgstreamer-1.5.so.0:0x76533
j1elo commented 6 years ago

I know this ain't the place to ask for estimates, but can You give an estimate if this problem is solvable for You in the next 30 days? Currently it's completely unusable.

Hi @Richard-Aasa, if this was some known bug on Kurento code I'd be able to let you know what is wrong and how much time we expected to take for a fix.

But this is a bug in libnice, reported 9 months ago and currently tracked here: libnice issue #33: segfault in g_socket_send_message

We at Kurento still don't have enough information about the cause of the issue, because everybody just writes their stack traces, which are all the same so more reports don't equal more information. If you can reproduce this while debugging with GDB and get some insight about why the crash happens, it could help a lot.

We all know that libnice is crashing in agent.c at line 4833 but we have been unable to reproduce it in debugging sessions (as it seems to be a crash that only happens with some amount of load on production servers); we are working on a project that will allow to simulate serious high-load scenarios, as a side effort. But for now, the best option is to show interest in solving that bug in the link above. They have all the context and the know-how about libnice's code base and will probably be able to solve the issue.

I want to have it fixed, so you'll see I've been active in the comments trying to move it forward. No answer so far, but hopefully that will happen soon. I'd encourage you (and everybody interested) to provide debugging information in that bug report, so the maintainers have as much information as possible.

neilyoung commented 6 years ago

@Richard-Aasa How did you make the KMS to trace the libnice requests? Just setting DEBUG level?

j1elo commented 6 years ago

You mean all those libnice-DEBUG and libnice-stun-DEBUG messages?

It's enabled by the WebRtcEndpoint code if the environment variables G_MESSAGES_DEBUG and NICE_DEBUG are set when KMS runs.

By default, it's easy to enable this behavior by editing the Kurento service settings (/etc/default/kurento-media-server) after an apt-get installation.

neilyoung commented 6 years ago

Yepp, thanks. Maybe those will help me to provide more input for my TURN problem (https://github.com/Kurento/bugtracker/issues/294), even though from my POV it is self explanatory...

neilyoung commented 6 years ago

Is the KMS supposed to be compiled as debug version? I can't make these traces visible, even though I see this in my release KMS:

2018-08-22T18:26:22,245807 11634 0x00007f36f544a700    info kmsiceniceagent           kmsiceniceagent.c:256 kms_ice_nice_agent_new() <KmsIceNiceAgent@0x7f36e002f560>  Enable debug logging in 'libnice' library
Richard-Aasa commented 6 years ago

@neilyoung Delegating some info from IT regarding version 6.7.1:

Just take all export lines from /etc/default/kurento-media-server and run everything from command line under kurento user(not as service). All debug messages are visible in console. Kurento does not write those messages to log files - maybe it is a bug. su - kurento --shell=/bin/bash export GST_DEBUG="3,Kurento:4,kms:4,kmsiceniceagent:5,kmswebrtcsession:5,webrtcendpoint:4" export NICE_DEBUG="libnice,libnice-stun" export KURENTO_LOGS_PATH="/var/log/kurento-media-server" export KURENTO_NUMBER_LOG_FILES=2 /usr/bin/kurento-media-server

neilyoung commented 6 years ago

Yes thanks got it already

Sent from my iPhone

Am 30.08.2018 um 10:32 schrieb Richard Aasa notifications@github.com:

@neilyoung Delegating some info from IT regarding version 6.7.1:

Just take all export lines from /etc/default/kurento-media-server and run everything from command line under kurento user(not as service). All debug messages are visible in console. Kurento does not write those messages to log files - maybe it is a bug. su - kurento --shell=/bin/bash export GST_DEBUG="3,Kurento:4,kms:4,kmsiceniceagent:5,kmswebrtcsession:5,webrtcendpoint:4" export NICE_DEBUG="libnice,libnice-stun" export KURENTO_LOGS_PATH="/var/log/kurento-media-server" export KURENTO_NUMBER_LOG_FILES=2 /usr/bin/kurento-media-server

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

Richard-Aasa commented 6 years ago

I'm not very comfortable with C so here's something we've tried that did not work. It seemed very similar to what was proposed by: @yongje.lee on https://gitlab.freedesktop.org/libnice/libnice/issues/33,

libnice tcp-bsd.c :

 if (!sock) { 
   nice_debug ("!sock detected! Returning -1"); 
   return -1; 
} 

 if (!sock->fileno) { 
   nice_debug ("!sock->fileno detected! Returning -1"); 
   return -1; 
 }

Stack:

(kurento-media-server:8259): libnice-DEBUG: Agent 0x7f39181b2a10: inbound STUN packet for 1/1 (stream/component) from [195.80.114.254]:53412 (92 octets) : 
(kurento-media-server:8259): libnice-stun-DEBUG: STUN demux: OK! 
(kurento-media-server:8259): libnice-stun-DEBUG: Comparing username/ufrag of len 13 and 4, equal=0 
(kurento-media-server:8259): libnice-stun-DEBUG:   username: 0x6c4337583a6264383030356366 
(kurento-media-server:8259): libnice-stun-DEBUG:   ufrag:    0x6c433758 
(kurento-media-server:8259): libnice-stun-DEBUG: Found valid username, returning password: 'L+c0QxkX1JJEk0jhZO1MoX' 
(kurento-media-server:8259): libnice-stun-DEBUG:  Message HMAC-SHA1 fingerprint: 
(kurento-media-server:8259): libnice-stun-DEBUG:   key     : 0x4c2b633051786b58314a4a456b306a685a4f314d6f58 
(kurento-media-server:8259): libnice-stun-DEBUG:   expected: 0x2ec8262372427f305714adba5f3fa453e191fe9f 
(kurento-media-server:8259): libnice-stun-DEBUG:   received: 0x2ec8262372427f305714adba5f3fa453e191fe9f 
(kurento-media-server:8259): libnice-stun-DEBUG: STUN auth: OK! 
(kurento-media-server:8259): libnice-stun-DEBUG: STUN unknown: 0 mandatory attribute(s)! 
(kurento-media-server:8259): libnice-stun-DEBUG: STUN Reply (buffer size = 1300)... 
(kurento-media-server:8259): libnice-stun-DEBUG:  Message HMAC-SHA1 message integrity: 
(kurento-media-server:8259): libnice-stun-DEBUG:   key     : 0x4c2b633051786b58314a4a456b306a685a4f314d6f58 
(kurento-media-server:8259): libnice-stun-DEBUG:   sent    : 0xd60fb40e3729cc5412fd9c0fb868aabdae21d50a 
(kurento-media-server:8259): libnice-stun-DEBUG:  Message HMAC-SHA1 fingerprint: 0x2a8e344b 
(kurento-media-server:8259): libnice-stun-DEBUG:  All done (response size: 84) 
(kurento-media-server:8259): libnice-DEBUG: Agent 0x7f39181b2a10 : STUN-CC RESP to '195.80.114.254:53412', socket=166, len=84, cand=0x7f390034c000 (c-id:1), use-cand=0. 
(kurento-media-server:8259): libnice-DEBUG: Agent 0x7f39181b2a10 : Found a matching pair 0x7f38fc2a2240 (6:remote1) (SUCCEEDED) ... 
(kurento-media-server:8259): libnice-DEBUG: Agent 0x7f39181b2a10 : nothing to do for pair 0x7f38fc2a2240. 
(kurento-media-server:8259): libnice-DEBUG: Agent 0x7f39181b2a10: Finding highest priority for component 1 
(kurento-media-server:8259): libnice-DEBUG: Agent 0x7f39181b2a10: Pruning pending checks. Highest nominated priority is 4341472239187460607 
(kurento-media-server:8259): libnice-DEBUG: Agent 0x7f39181b2a10 : conn.check list status: 1 nominated, 1 valid, c-id 1. 
(kurento-media-server:8259): libnice-DEBUG: agent_recv_message_unlocked: Valid STUN packet received. 
(kurento-media-server:8259): libnice-DEBUG: !sock->fileno detected! Returning -1 
(kurento-media-server:8259): libnice-DEBUG: !sock->fileno detected! Returning -1 
(kurento-media-server:8259): libnice-DEBUG: !sock->fileno detected! Returning -1 
(kurento-media-server:8259): libnice-DEBUG: !sock->fileno detected! Returning -1 
(kurento-media-server:8259): libnice-DEBUG: !sock->fileno detected! Returning -1 
(kurento-media-server:8259): libnice-DEBUG: !sock->fileno detected! Returning -1 
(kurento-media-server:8259): libnice-DEBUG: !sock->fileno detected! Returning -1 
(kurento-media-server:8259): libnice-DEBUG: !sock->fileno detected! Returning -1 
(kurento-media-server:8259): libnice-DEBUG: !sock->fileno detected! Returning -1 
(kurento-media-server:8259): libnice-DEBUG: !sock->fileno detected! Returning -1 
(kurento-media-server:8259): libnice-DEBUG: !sock->fileno detected! Returning -1 
(kurento-media-server:8259): libnice-DEBUG: !sock->fileno detected! Returning -1 
(kurento-media-server:8259): libnice-DEBUG: !sock->fileno detected! Returning -1 
(kurento-media-server:8259): libnice-DEBUG: !sock->fileno detected! Returning -1 
(kurento-media-server:8259): libnice-DEBUG: !sock->fileno detected! Returning -1 
(kurento-media-server:8259): libnice-DEBUG: !sock->fileno detected! Returning -1 
(kurento-media-server:8259): libnice-DEBUG: !sock->fileno detected! Returning -1 
(kurento-media-server:8259): libnice-DEBUG: !sock->fileno detected! Returning -1 
(kurento-media-server:8259): libnice-DEBUG: !sock->fileno detected! Returning -1 
(kurento-media-server:8259): libnice-DEBUG: !sock->fileno detected! Returning -1
garry81 commented 6 years ago

@Richard-Aasa in my case, it is not enough to check whether fd is valid value becuase socket is invalidated due to RST but libnice's socket fd isn't updated. would you check fd with G_IS_SOCKET which I suggested on that post?

j1elo commented 6 years ago

Hi @Richard-Aasa , I don't think the socket_send_message() will get called with NiceSocket *sock = NULL, but for these tests it's worth adding the check.

Do as @garry81 explains and do this (complete snippet for clarity):

static gssize
socket_send_message (NiceSocket *sock,
    const NiceOutputMessage *message, gboolean reliable)
{
  TcpPriv *priv = sock->priv;
  gssize ret;
  GError *gerr = NULL;
  gsize message_len;

  if (sock == NULL) {
    nice_debug ("NULL sock detected! Returning -1");
    return -1;
  }

  /* Socket has been closed: */
  if (sock->priv == NULL)
    return -1;

  /* Don't try to access the socket if it had an error, otherwise we risk a
   * crash with SIGPIPE (Broken pipe) */
  if (priv->error)
    return -1;

  if (!G_IS_SOCKET (sock->fileno)) {
    nice_debug ("INVALID sock->fileno detected! Returning -1");
    return -1;
  }

  message_len = output_message_get_size (message);

  ...

Also please let us know:

garry81 commented 6 years ago
Richard-Aasa commented 6 years ago

Relaying more info, seems like there's a disagreement as to what is the actual cause:

"did not work" is not actually correct. It did make it much better and kurento-server did not crash, at least not so fast.

At first log is clean and no socket errors. But once sock->fileno errors start to appear there is quite a lot of them and so much that it does not seem like a normal situation. In some situations, when we did stop everything, and we still saw sock->fileno errors piling up in log (it was the only log line repeating over and over again). The problem is not in socket_send_message function (not even sure it is libnice problem). Catching invalid socket there helps but problem is somewhere before that. The problem is with some other code that creates this invalid socket - the question is where and why? And it might cause more problems than just excessive logs.

libnice version is 0.1.14-96-g090d + "fix"

/usr/lib/x86_64-linux-gnu/libnice.so.10 is linked to correct file. We are building under Ubuntu 16.04 and just symlinking to newly build libnice. We are not building/installing deb packages.

Regardless of that "fix" we did had segfault crash yesterday. However, we were unable to catch stack trace, so we are not sure what the cause was. There is suspicion that it might happened because power was cut from WiFi router that served several clients. But we have not confirmed that.

We do not have any testing procedure that always causes crashes, even without that "fix".

j1elo commented 6 years ago

@Richard-Aasa @garry81 Hi, could you please provide a small update on this? Was the socket check a good change, or crash still happens?

jmaiquez commented 5 years ago

Hi Juan,

I left an extended summary of our experiences this week in the assert bug thread.

I think the take-away is the last part:

... it seems to me that the key to reproducing the socket crash is load testing. How much load have you guys been able to put on KMS and run a session for over an hour?

Best regards, Jorge

puneet89 commented 5 years ago

Hi Everyone, I am able to reproduce the crash frequently with my own testing script. Test Setup- Kurento Media Server and Kurento-Tutorial-Group-call-java example running on one server. My Test Application running on Another server, Test Application is a python-selenium bases script whose multiple instances are running (say 16 User) and with few runs i am able to crash Kurento Media server. Python Script-------> Kurento-group-call-java--------->Kurento-Media-Server Following Are Crash Logs:

libnice:ERROR:agent.c:2342:agent_signal_component_state_change: assertion failed: (TRANSITION (DISCONNECTED, FAILED) || TRANSITION (GATHERING, FAILED) || TRANSITION (CONNECTING, FAILED) || TRANSITION (CONNECTED, FAILED) || TRANSITION (READY, FAILED) || TRANSITION (DISCONNECTED, GATHERING) || TRANSITION (GATHERING, CONNECTING) || TRANSITION (CONNECTING, CONNECTED) || TRANSITION (CONNECTED, READY) || TRANSITION (READY, CONNECTED) || TRANSITION (FAILED, CONNECTING) || TRANSITION (FAILED, GATHERING) || TRANSITION (DISCONNECTED, CONNECTING))
^[[31;1mAborted^[[0m (thread ^[[33;1m140629420529408^[[0m, pid ^[[33;1m14441^[[0m)
Stack trace:
^[[34;1m[__GI_raise]^[[0m
sysdeps/unix/sysv/linux/raise.c^[[32;1m:54^[[0m
^[[34;1m[__GI_abort]^[[0m
/build/glibc-Cl5G7W/glibc-2.23/stdlib/abort.c^[[32;1m:91^[[0m
^[[34;1m[g_assertion_message]^[[0m
/build/glib2.0-7ZsPUq/glib2.0-2.48.2/./glib/gtestutils.c^[[32;1m:2429^[[0m
^[[34;1m[g_assertion_message_expr]^[[0m
/build/glib2.0-7ZsPUq/glib2.0-2.48.2/./glib/gtestutils.c^[[32;1m:2453^[[0m
^[[34;1m[agent_signal_component_state_change]^[[0m
/opt/libnice/agent/agent.c^[[32;1m:2353^[[0m
^[[34;1m[priv_map_reply_to_conn_check_request]^[[0m
/opt/libnice/agent/conncheck.c^[[32;1m:3420^[[0m
^[[34;1m[agent_recv_message_unlocked]^[[0m
/opt/libnice/agent/agent.c^[[32;1m:3886^[[0m
^[[34;1m[component_io_cb]^[[0m
/opt/libnice/agent/agent.c^[[32;1m:5181^[[0m
^[[34;1m[socket_source_dispatch]^[[0m
/build/glib2.0-7ZsPUq/glib2.0-2.48.2/./gio/gsocket.c^[[32;1m:3543^[[0m
^[[34;1m[g_main_dispatch]^[[0m
/build/glib2.0-7ZsPUq/glib2.0-2.48.2/./glib/gmain.c^[[32;1m:3157^[[0m
^[[34;1m[g_main_context_iterate]^[[0m
/build/glib2.0-7ZsPUq/glib2.0-2.48.2/./glib/gmain.c^[[32;1m:3840^[[0m
^[[34;1m[g_main_loop_run]^[[0m
/build/glib2.0-7ZsPUq/glib2.0-2.48.2/./glib/gmain.c^[[32;1m:4033^[[0m
^[[34;1m[gst_nice_src_create]^[[0m
/workspace/gst/gstnicesrc.c^[[32;1m:292^[[0m
^[[34;1m[gst_base_src_get_range]^[[0m
/opt/gstreamer/libs/gst/base/gstbasesrc.c^[[32;1m:2465^[[0m
^[[34;1m[gst_base_src_loop]^[[0m
/opt/gstreamer/libs/gst/base/gstbasesrc.c^[[32;1m:2737^[[0m
^[[34;1m[gst_task_func]^[[0m

CRASH Detailed Log-

Thread 8056 "queue2062:src" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffe766f4700 (LWP 31874)]
g_socket_send_message (socket=0x0, address=address@entry=0x0, vectors=0x7fff540853d0, num_vectors=2, 
    messages=messages@entry=0x0, num_messages=num_messages@entry=0, flags=0, cancellable=0x0, 
    error=0x7ffe766f2620) at /build/glib2.0-7ZsPUq/glib2.0-2.48.2/./gio/gsocket.c:4255
4255    /build/glib2.0-7ZsPUq/glib2.0-2.48.2/./gio/gsocket.c: No such file or directory.
(gdb) bt
#0  0x00007ffff31ba044 in g_socket_send_message (socket=0x0, address=address@entry=0x0, vectors=0x7fff540853d0, num_vectors=2, messages=messages@entry=0x0, num_messages=num_messages@entry=0, flags=0, cancellable=0x0, error=0x7ffe766f2620)
    at /build/glib2.0-7ZsPUq/glib2.0-2.48.2/./gio/gsocket.c:4255
#1  0x00007fffd9c5bcbf in socket_send_message (sock=sock@entry=0x7fffb4115a40, message=message@entry=0x7ffe766f2710, reliable=reliable@entry=0) at tcp-bsd.c:306
#2  0x00007fffd9c5bf3b in socket_send_messages (sock=0x7fffb4115a40, to=<optimized out>, messages=<optimized out>, n_messages=1)
    at tcp-bsd.c:360
#3  0x00007fffd9c43ae9 in nice_agent_send_messages_nonblocking_internal (agent=0x7fffa46fb1b0 [NiceAgent], stream_id=<optimized out>, component_id=<optimized out>, messages=0x7fff5c098eb0, 
    messages@entry=0x86c29a95c2c2b400, n_messages=n_messages@entry=1, allow_partial=allow_partial@entry=0, error=0x0) at agent.c:4748
#4  0x00007fffd9c4434f in nice_agent_send_messages_nonblocking (agent=<optimized out>, stream_id=<optimized out>, component_id=<optimized out>, messages=0x86c29a95c2c2b400, messages@entry=0x7fff5c098eb0, n_messages=n_messages@entry=1, cancellable=cancellable@entry=0x0, error=0x0)
    at agent.c:4833
#5  0x00007fffa39db6a2 in gst_nice_sink_render_buffers (sink=sink@entry=0x7fffa875bbe0 [GstNiceSink], buffers=buffers@entry=0x7ffe766f2848, num_buffers=num_buffers@entry=1, mem_nums=mem_nums@entry=0x7ffe766f2857 "\001", total_mem_num=<optimized out>) at gstnicesink.c:297
#6  0x00007fffa39dbfa3 in gst_nice_sink_render (basesink=<optimized out>, buffer=0x7fffac12d560) at gstnicesink.c:331
#7  0x00007ffff28331b2 in gst_base_sink_chain_unlocked (basesink=basesink@entry=0x7fffa875bbe0 [GstNiceSink], obj=obj@entry=0x7fffac12d560, is_list=is_list@entry=0, pad=<optimized out>) at gstbasesink.c:3532
#8  0x00007ffff2834620 in gst_base_sink_chain_main (basesink=0x7fffa875bbe0 [GstNiceSink], pad=<optimized out>, obj=0x7fffac12d560, is_list=0) at gstbasesink.c:3655
#9  0x00007ffff5ec95cf in gst_pad_push_data (data=0x7fffac
#10 0x00007ffff5ec95cf in gst_pad_push_data (pad=pad@entry=0x7fffc431aa20 [GstGhostPad], type=type@entry=4112, data=data@entry=0x7fffac12d560) at gstpad.c:4435
#11 0x00007ffff5ed1533 in gst_pad_push (pad=pad@entry=0x7fffc431aa20 [GstGhostPad], buffer=buffer@entry=0x7fffac12d560) at gstpad.c:4554
#12 0x00007ffff5eba5e3 in gst_proxy_pad_chain_default (pad=0x7fff78298f50 [GstProxyPad], parent=<optimized out>, buffer=0x7fffac12d560)
    at gstghostpad.c:126
#13 0x00007ffff5ec95cf in gst_pad_push_data (data=0x7fffac12d560, type=4112, pad=0x7fff78298f50 [GstProxyPad]) at gstpad.c:4183
#14 0x00007ffff5ec95cf in gst_pad_push_data (pad=pad@entry=0x7fffa8974b80 [GstPad], type=type@entry=4112, data=data@entry=0x7fffac12d560)
    at gstpad.c:4435
#15 0x00007ffff5ed1533 in gst_pad_push (pad=0x7fffa8974b80 [GstPad], buffer=0x7fffac12d560) at gstpad.c:4554
#16 0x00007fffa379848d in gst_funnel_sink_chain_object (pad=0x7fff1c0edd80 [GstFunnelPad], funnel=0x7fffb8101360 [GstFunnel], is_list=0, obj=0x7fffac12d560) at gstfunnel.c:452
#17 0x00007ffff5ec95cf in gst_pad_push_data (data=0x7fffac12d560, type=4112, pad=0x7fff1c0edd80 [GstFunnelPad]) at gstpad.c:4183
#18 0x00007ffff5ec95cf in gst_pad_push_data (pad=pad@entry=0x7fffa85eb900 [GstPad], type=type@entry=4112, data=data@entry=0x7fffac12d560)
    at gstpad.c:4435
#19 0x00007ffff5ed1533 in gst_pad_push (pad=0x7fffa85eb900 [GstPad], buffer=buffer@entry=0x7fffac12d560) at gstpad.c:4554
#20 0x00007fffa3dfb531 in gst_srtp_enc_chain (pad=0x7fffa85eb6c0 [GstPad], parent=0x7fffcc267480 [GstSrtpEnc], buf=0x7fff701f18d0, is_rtcp=<optimized out>) at gstsrtpenc.c:1095
---Type <return> to continue, or q <return> to quit---
#21 0x00007ffff5ec95cf in gst_pad_push_data (data=0x7fff701f18d0, type=4112, pad=0x7fffa85eb6c0 [GstPad]) at gstpad.c:4183
#22 0x00007ffff5ec95cf in gst_pad_push_data (pad=pad@entry=0x7fff1c0ecfb0 [GstProxyPad], type=type@entry=4112, data=data@entry=0x7fff701f18d0) at gstpad.c:4435
#23 0x00007ffff5ed1533 in gst_pad_push (pad=pad@entry=0x7fff1c0ecfb0 [GstProxyPad], buffer=buffer@entry=0x7fff701f18d0) at gstpad.c:4554
#24 0x00007ffff5eba5e3 in gst_proxy_pad_chain_default (pad=0x7fffc431b650 [GstGhostPad], parent=<optimized out>, buffer=0x7fff701f18d0)
    at gstghostpad.c:126

    at gstghostpad.c:126
#25 0x00007ffff5ec95cf in gst_pad_push_data (data=0x7fff701f18d0, type=4112, pad=0x7fffc431b650 [GstGhostPad]) at gstpad.c:4183
#26 0x00007ffff5ec95cf in gst_pad_push_data (pad=pad@entry=0x7fffc431a540 [GstGhostPad], type=type@entry=4112, data=data@entry=0x7fff701f18d0) at gstpad.c:4435
#27 0x00007ffff5ed1533 in gst_pad_push (pad=pad@entry=0x7fffc431a540 [GstGhostPad], buffer=buffer@entry=0x7fff701f18d0) at gstpad.c:4554
#28 0x00007ffff5eba5e3 in gst_proxy_pad_chain_default (pad=0x7fff1c0ecb10 [GstProxyPad], parent=<optimized out>, buffer=0x7fff701f18d0)
    at gstghostpad.c:126
#29 0x00007ffff5ec95cf in gst_pad_push_data (data=0x7fff701f18d0, type=4112, pad=0x7fff1c0ecb10 [GstProxyPad]) at gstpad.c:4183
#30 0x00007ffff5ec95cf in gst_pad_push_data (pad=pad@entry=0x7fffa8975b40 [GstPad], type=type@entry=4112, data=data@entry=0x7fff701f18d0)
    at gstpad.c:4435
#31 0x00007ffff5ed1533 in gst_pad_push (pad=pad@entry=0x7fffa8975b40 [GstPad], buffer=buffer@entry=0x7fff701f18d0) at gstpad.c:4554
#32 0x00007fffd11eed2a in gst_rtp_session_send_rtp (sess=<optimized out>, src=<optimized out>, data=0x7fff701f18d0, user_data=0x7fffb45b54e0)
    at gstrtpsession.c:1369
#33 0x00007fffd11e02d0 in source_push_rtp (source=0x555555edf500 [RTPSource], data=0x7fff701f18d0, session=0x7fff381fad00 [RTPSession])
    at rtpsession.c:1375
#34 0x00007fffd11ebdae in rtp_source_send_rtp (src=src@entry=0x555555edf500 [RTPSource], pinfo=pinfo@entry=0x7ffe766f3180)
    at rtpsource.c:1314
#35 0x00007fffd11e60ed in rtp_session_send_rtp (sess=0x7fff381fad00 [RTPSession], data=data@entry=0x7fff701f18d0, is_list=is_list@entry=0, current_time=<optimized out>, running_time=running_time@entry=321404460427) at rtpsession.c:2942
#36 0x00007fffd11efbc2 in gst_rtp_session_chain_send_rtp_common (rtpsession=0x7fffb45b54e0 [GstRtpSession], data=0x7fff701f18d0, is_list=0)
    at gstrtpsession.c:2319
#37 0x00007ffff5ec95cf in gst_pad_push_data (data=0x7fff701f18d0, type=4112, pad=0x7fffa8975900 [GstPad]) at gstpad.c:4183
#38 0x00007ffff5ec95cf in gst_pad_push_data (pad=pad@entry=0x7fffc431b3e0 [GstGhostPad], type=type@entry=4112, data=data@entry=0x7fff701f18d0) at gstpad.c:4435
#39 0x00007ffff5ed1533 in gst_pad_push (pad=pad@entry=0x7fffc431b3e0 [GstGhostPad], buffer=buffer@entry=0x7fff701f18d0) at gstpad.c:4554
#40 0x00007ffff5eba5e3 in gst_proxy_pad_chain_default (pad=0x7fff1c0ec8c0 [GstProxyPad], parent=<optimized out>, buffer=0x7fff701f18d0)
    at gstghostpad.c:126
#41 0x00007ffff5ec95cf in gst_pad_push_data (data=0x7fff701f18d0, type=4112, pad=0x7fff1c0ec8c0 [GstProxyPad]) at gstpad.c:4183
#42 0x00007ffff5ec95cf in gst_pad_push_data (pad=pad@entry=0x7fffa8975480 [GstPad], type=type@entry=4112, data=data@entry=0x7fff701f18d0)
    at gstpad.c:4435
#43 0x00007ffff5ed1533 in gst_pad_push (pad=0x7fffa8975480 [GstPad], buffer=0x7fff701f18d0) at gstpad.c:4554
#44 0x00007ffff5ec95cf in gst_pad_push_data (data=0x7fff701f18d0, type=4112, pad=0x7fffa89756c0 [GstPad]) at gstpad.c:4183
#45 0x00007ffff5ec95cf in gst_pad_push_data (pad=pad@entry=0x7fff782993f0 [GstProxyPad], type=type@entry=4112, data=data@entry=0x7fff701f18d0)---Type <return> to continue, or q <return> to quit---
at gstpad.c:4435
#46 0x00007ffff5ed1533 in gst_pad_push (pad=pad@entry=0x7fff782993f0 [GstProxyPad], buffer=buffer@entry=0x7fff701f18d0) at gstpad.c:4554
#47 0x00007ffff5eba5e3 in gst_proxy_pad_chain_default (pad=0x7fffc431b170 [GstGhostPad], parent=<optimized out>, buffer=0x7fff701f18d0)
    at gstghostpad.c:126
#48 0x00007ffff5ec95cf in gst_pad_push_data (data=0x7fff701f18d0, type=4112, pad=0x7fffc431b170 [GstGhostPad]) at gstpad.c:4183
#49 0x00007ffff5ec95cf in gst_pad_push_data (pad=pad@entry=0x7fff1c0ec1d0 [GstProxyPad], type=type@entry=4112, data=data@entry=0x7fff701f18d0) at gstpad.c:4435
#50 0x00007ffff5ed1533 in gst_pad_push (pad=pad@entry=0x7fff1c0ec1d0 [GstProxyPad], buffer=buffer@entry=0x7fff701f18d0) at gstpad.c:4554
#51 0x00007ffff5eba5e3 in gst_proxy_pad_chain_default (pad=0x7fffc431ac90 [GstGhostPad], parent=<optimized out>, buffer=0x7fff701f18d0)
    at gstghostpad.c:126
#52 0x00007ffff5ec95cf in gst_pad_push_data (data=0x7fff701f18d0, type=4112, pad=0x7fffc431ac90 [GstGhostPad]) at gstpad.c:4183
#53 0x00007ffff5ec95cf in gst_pad_push_data (pad=pad@entry=0x7fffa8434df0 [GstPad], type=type@entry=4112, data=data@entry=0x7fff701f18d0)
    at gstpad.c:4435
#54 0x00007ffff5ed1533 in gst_pad_push (pad=0x7fffa8434df0 [GstPad], buffer=0x7fff701f18d0) at gstpad.c:4554
#55 0x00007ffff34d90e5 in gst_rtp_base_payload_push (payload=payload@entry=0x7fffb8af2600 [GstRtpOPUSPay], buffer=<optimized out>)
    at gstrtpbasepayload.c:1343
#56 0x00007fffa0d3b81b in gst_rtp_opus_pay_handle_buffer (basepayload=0x7fffb8af2600 [GstRtpOPUSPay], buffer=0x7fff0431dae0)
    at gstrtpopuspay.c:218
#57 0x00007ffff5ec95cf in gst_pad_push_data (data=0x7fff0431dae0, type=4112, pad=0x7fffa8435030 [GstPad]) at gstpad.c:4183
#58 0x00007ffff5ec95cf in gst_pad_push_data (pad=pad@entry=0x7fffc8387c80 [GstProxyPad], type=type@entry=4112, data=data@entry=0x7fff0431dae0) at gstpad.c:4435
#59 0x00007ffff5ed1533 in gst_pad_push (pad=pad@entry=0x7fffc8387c80 [GstProxyPad], buffer=buffer@entry=0x7fff0431dae0) at gstpad.c:4554
#60 0x00007ffff5eba5e3 in gst_proxy_pad_chain_default (pad=0x7fffb49ea050 [GstGhostPad], parent=<optimized out>, buffer=0x7fff0431dae0)
    at gstghostpad.c:126
#61 0x00007ffff5ec95cf in gst_pad_push_data (data=0x7fff0431dae0, type=4112, pad=0x7fffb49ea050 [GstGhostPad]) at gstpad.c:4183
#62 0x00007ffff5ec95cf in gst_pad_push_data (pad=pad@entry=0x7fff98192a20 [GstGhostPad], type=type@entry=4112, data=data@entry=0x7fff0431dae0) at gstpad.c:4435
#63 0x00007ffff5ed1533 in gst_pad_push (pad=pad@entry=0x7fff98192a20 [GstGhostPad], buffer=buffer@entry=0x7fff0431dae0) at gstpad.c:4554
#64 0x00007ffff5eba5e3 in gst_proxy_pad_chain_default (pad=0x7fff901077a0 [GstProxyPad], parent=<optimized out>, buffer=0x7fff0431dae0)
    at gstghostpad.c:126
#65 0x00007ffff5ec95cf in gst_pad_push_data (data=0x7fff0431dae0, type=4112, pad=0x7fff901077a0 [GstProxyPad]) at gstpad.c:4183
#66 0x00007ffff5ec95cf in gst_pad_push_data (pad=pad@entry=0x7fffb49ea2c0 [GstGhostPad], type=type@entry=4112, data=data@entry=0x7fff0431dae0) at gstpad.c:4435
#67 0x00007ffff5ed1533 in gst_pad_push (pad=pad@entry=0x7fffb49ea2c0 [GstGhostPad], buffer=buffer@entry=0x7fff0431dae0) at gstpad.c:4554
#68 0x00007ffff5eba5e3 in gst_proxy_pad_chain_default (pad=0x7fff0c067460 [GstProxyPad], parent=<optimized out>, buffer=0x7fff0431dae0)
    at gstghostpad.c:126
#69 0x00007ffff5ec95cf in gst_pad_push_data (data=0x7fff0431dae0, type=4112, pad=0x7fff0c067460 [GstProxyPad]) at gstpad.c:4183
#70 0x00007ffff5ec95cf in gst_pad_push_data (pad=pad@entry=0x7fffa8943030 [GstPad], type=type@entry=4112, data=data@entry=0x7fff0431dae0)
---Type <return> to continue, or q <return> to quit---
    at gstpad.c:4435
#71 0x00007ffff5ed1533 in gst_pad_push (pad=0x7fffa8943030 [GstPad], buffer=buffer@entry=0x7fff0431dae0) at gstpad.c:4554
#72 0x00007fffa37a9e1e in gst_queue_loop (queue=0x7fff20265ae0 [GstQueue]) at gstqueue.c:1481
#73 0x00007fffa37a9e1e in gst_queue_loop (pad=<optimized out>) at gstqueue.c:1638
#74 0x00007ffff5efbc5b in gst_task_func (task=0x7fff981e6050 [GstTask]) at gsttask.c:344
#75 0x00007ffff453a5ee in g_thread_pool_thread_proxy (data=<optimized out>) at /build/glib2.0-7ZsPUq/glib2.0-2.48.2/./glib/gthreadpool.c:307
#76 0x00007ffff4539c55 in g_thread_proxy (data=0x7fffa8917280) at /build/glib2.0-7ZsPUq/glib2.0-2.48.2/./glib/gthread.c:780
#77 0x00007ffff70286ba in start_thread (arg=0x7ffe766f4700) at pthread_create.c:333
#78 0x00007ffff4cfd41d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Best Regards Puneet

j1elo commented 5 years ago

Got news regarding the SOCKET issue (not the ASSERT issue, for that please refer to the corresponding bug report: https://github.com/Kurento/bugtracker/issues/268)

I was able to produce the libnice crash with socket=0x0 in a development system, with just two clients streaming, both sending and receiving audio and video from each other.

Crash was caused just after pulling the cord from one of the clients, which was connected via Ethernet cable.

I think this has to do with libnice deleting a GSource object (due to Connection Closed detected after pulling the cord of one of the clients), but at the same time using that same GSource (I guess from another thread)

This also contradicts the previous assumption that the crash only happens when the system is under heavy load. It just happens more in those situations because there are more connections (and possibly disconnections) in such conditions, but it's definitely possible (albeit a bit random) to reproduce the crash in a simpler dev environment. This alone is good news, so far.

I haven't been able to reproduce the issue a second time, though. Being a threading issue, it's just a matter of luck having it happening or not. But at least it gives us a new baseline from which to continue troubleshooting.

jmaiquez commented 5 years ago

Hi Juan,

That's great news. I will try this approach next week in my testing.

What are the next steps? Figure out how to reproduce it reliably? Is that even possible if this is a thread timing issue?

I guess what I'm asking is: how excited should I be with this discovery? :-)

Also, what about Puneet's message about being able to reproduce it reliably?

Please let me know if there is anything we can do.

Best regards, Jorge

j1elo commented 5 years ago

Well, it's a good discovery because it can guide towards building a reproduction scenario. It will probably need several tries to break, because seems that the issue doesn't happen every time... but a reproduction test which loops over the same steps infinitely, should be able to break libnice in not many repetitions.

I think it should be possible to simulate the "pulling ethernet cord / plugging it back" part by setting and unsetting iptables rules dynamically, in such a way that the tcp/udp ports get closed/open constantly. Or maybe with a Docker image, it's possible to allow/disallow ports... I think that could be a good start.

After all, the true first step would be to catch this crash during a GDB debugging session, to be able to inspect the state when this occurs, and maybe be able to deduce a possible solution.

Of course, the ideal course of action is libnice devs / maintainers getting active interest in this issue and helping with this debugging, so providing a reproduction tool would help them a lot.

I've seen Puneet comment but will have to study more carefully. It seems to conflate the ASSERT and SOCKET issues, as both are included in his comment...

puneet89 commented 5 years ago

Hi Juan and Jorge,

Following is the snippet of my approach using which i can crash Kurento Media server multiple times .( Please use Kurento-Tutorial-Group-call-java as application)

test.py (python Script)

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
#from selenium.webdriver.support.ui import WebDriverWait
#from selenium.webdriver.support import expected_conditions as EC
import time
import string
import random

def id_generator(size=6,chars=string.ascii_uppercase+string.digits):
    return ''.join(random.choice(chars) for _ in range (size))

chrome_options = Options()
chrome_options.add_argument("--use-file-for-fake-video-capture=/home/truring12/Johnny_1280x720_60.y4m")
chrome_options.add_argument("--use-fake-device-for-media-stream")
chrome_options.add_argument("--use-fake-ui-for-media-stream")
chrome_options.add_argument("--disable-web-security")
chrome_options.add_argument("--allow-insecure-localhost")
chrome_options.add_argument("--reduce-security-for-testing")
chrome_options.add_argument("--new-tab")
driver = webdriver.Chrome(chrome_options=chrome_options)

usernameStr = id_generator()
roomnameStr = 'a'
driver.get("https://35.200.198.222:8443")
username = driver.find_element_by_id('name')
username.send_keys(usernameStr)
roomname = driver.find_element_by_id('roomName')
roomname.send_keys(roomnameStr)
driver.find_element_by_xpath("//*[@id='join']/form/p[3]/input").submit()
print driver.page_source.encode('utf-8')
time.sleep(30)
button = driver.find_element_by_id('button-leave')
button.click();
time.sleep(1)
driver.quit()

Now Create a simple bash script like (run.ksh) and run it few times, Kurento Media server will crash after few runs.

run.ksh

python ./test.py &
sleep 1
python ./test.py &
sleep 1
python ./test.py &
sleep 1
python ./test.py &
sleep 1
python ./test.py &
sleep 1
python ./test.py &
sleep 1
python ./test.py &
sleep 1
python ./test.py &
sleep 1
python ./test.py &
sleep 1
python ./test.py &
sleep 1
python ./test.py &
sleep 1
python ./test.py &
sleep 1
python ./test.py &

Let me know in case you need any further information.

Best Regards Puneet

puneet89 commented 5 years ago

And with above approach you can reproduce both the ASSERT and SOCKET issues frequently.

puneet89 commented 5 years ago

Hi, I can reproduce the SOCKET issue very easily with my testing script , let me know if you need any help in reproducing same. I am sharing detailed GDB log for same. Packages installed on my server:

dpkg -l | grep gstreamer
ii  gir1.2-gstreamer-1.5                   1.8.1.1.xenial~20180709141930.170.0d6031b  amd64        Description: GObject introspection data for the GStreamer library
ii  gstreamer1.5-alsa:amd64                1.8.1.1.xenial~20180709143344.55.7b19cfd   amd64        GStreamer plugin for ALSA
ii  gstreamer1.5-libav:amd64               1.8.2.1.xenial~20180709150437.96.493eee4   amd64        libav plugin for GStreamer
ii  gstreamer1.5-libav-dbg:amd64           1.8.2.1.xenial~20180709150437.96.493eee4   amd64        libav plugin for GStreamer (debug symbols)
ii  gstreamer1.5-nice:amd64                0.1.15-1ubuntu1~20180808133501.gbpae8742   amd64        ICE library (GStreamer 1.5 plugin)
ii  gstreamer1.5-nice-dbgsym:amd64         0.1.15-1ubuntu1~20180808133501.gbpae8742   amd64        debug symbols for package gstreamer1.5-nice
ii  gstreamer1.5-plugins-bad:amd64         1.8.1.1.xenial~20180709144322.100.3db37b1  amd64        GStreamer plugins from the "bad" set
ii  gstreamer1.5-plugins-bad-dbg:amd64     1.8.1.1.xenial~20180709144322.100.3db37b1  amd64        GStreamer plugins from the "bad" set (debug symbols)
ii  gstreamer1.5-plugins-base:amd64        1.8.1.1.xenial~20180709143344.55.7b19cfd   amd64        GStreamer plugins from the "base" set
ii  gstreamer1.5-plugins-base-dbg:amd64    1.8.1.1.xenial~20180709143344.55.7b19cfd   amd64        GStreamer plugins from the "base" set
ii  gstreamer1.5-plugins-good:amd64        1.8.1.1.xenial~20180709145521.112.9ee4248  amd64        GStreamer plugins from the "good" set
ii  gstreamer1.5-plugins-good-dbg:amd64    1.8.1.1.xenial~20180709145521.112.9ee4248  amd64        GStreamer plugins from the "good" set
ii  gstreamer1.5-plugins-ugly:amd64        1.8.1.1.xenial~20180709150155.89.2685b0f   amd64        GStreamer plugins from the "ugly" set
ii  gstreamer1.5-plugins-ugly-dbg:amd64    1.8.1.1.xenial~20180709150155.89.2685b0f   amd64        GStreamer plugins from the "ugly" set (debug symbols)
ii  gstreamer1.5-pulseaudio:amd64          1.8.1.1.xenial~20180709145521.112.9ee4248  amd64        GStreamer plugin for PulseAudio
ii  gstreamer1.5-tools                     1.8.1.1.xenial~20180709141930.170.0d6031b  amd64        Tools for use with GStreamer
ii  gstreamer1.5-x:amd64                   1.8.1.1.xenial~20180709143344.55.7b19cfd   amd64        GStreamer plugins for X11 and Pango
ii  libgstreamer-plugins-bad1.5-0:amd64    1.8.1.1.xenial~20180709144322.100.3db37b1  amd64        GStreamer development files for libraries from the "bad" set
ii  libgstreamer-plugins-base1.5-0:amd64   1.8.1.1.xenial~20180709143344.55.7b19cfd   amd64        GStreamer libraries from the "base" set
ii  libgstreamer-plugins-base1.5-dev       1.8.1.1.xenial~20180709143344.55.7b19cfd   amd64        GStreamer development files for libraries from the "base" set
ii  libgstreamer1.5-0:amd64                1.8.1.1.xenial~20180709141930.170.0d6031b  amd64        Core GStreamer libraries and elements
ii  libgstreamer1.5-0-dbg:amd64            1.8.1.1.xenial~20180709141930.170.0d6031b  amd64        Core GStreamer libraries and elements
ii  libgstreamer1.5-dev                    1.8.1.1.xenial~20180709141930.170.0d6031b  amd64        GStreamer core development files
dpkg -l | grep libnice
ii  libnice-dbg:amd64                      0.1.15.xenial~20180709152002.83.28531a4    amd64        ICE library (debugging symbols)
ii  libnice-dev                            0.1.15.xenial~20180709152002.83.28531a4    amd64        ICE library (development files)
ii  libnice10:amd64                        0.1.15.xenial~20180709152002.83.28531a4    amd64        ICE library (shared library)

Please find below detailed backtrace for SOCKET issue

(gdb) bt
#0  0x00007ffff2f45044 in g_socket_send_message (socket=0x0, address=address@entry=0x0, vectors=0x555555c57ba0, num_vectors=2, messages=messag
es@entry=0x0, num_messages=num_messages@entry=0, flags=0, cancellable=0x0, error=0x7ffec17690f0)
    at /build/glib2.0-7ZsPUq/glib2.0-2.48.2/./gio/gsocket.c:4255
#1  0x00007fffef9c3cbf in socket_send_message (sock=sock@entry=0x7fffb40e0cb0, message=message@entry=0x7ffec17691e0, reliable=reliable@entry=0
) at tcp-bsd.c:306
#2  0x00007fffef9c3f3b in socket_send_messages (sock=0x7fffb40e0cb0, to=<optimized out>, messages=<optimized out>, n_messages=1)
    at tcp-bsd.c:360
#3  0x00007fffef9abae9 in nice_agent_send_messages_nonblocking_internal (agent=0x7fffc81ee2c0 [NiceAgent], stream_id=<optimized out>, componen
t_id=<optimized out>, messages=0x7fffbc7872c0, 
    messages@entry=0x6aa2facd41b08500, n_messages=n_messages@entry=1, allow_partial=allow_partial@entry=0, error=0x0) at agent.c:4748
#4  0x00007fffef9ac34f in nice_agent_send_messages_nonblocking (agent=<optimized out>, stream_id=<optimized out>, component_id=<optimized out>
, messages=0x6aa2facd41b08500, messages@entry=0x7fffbc7872c0, n_messages=n_messages@entry=1, cancellable=cancellable@entry=0x0, error=0x0)
    at agent.c:4833
#5  0x00007fffc5e556a2 in gst_nice_sink_render_buffers (sink=sink@entry=0x7fffbc8d7420 [GstNiceSink], buffers=buffers@entry=0x7ffec1769318, nu
m_buffers=num_buffers@entry=1, mem_nums=mem_nums@entry=0x7ffec1769327 "\001", total_mem_num=<optimized out>) at gstnicesink.c:297
#6  0x00007fffc5e55fa3 in gst_nice_sink_render (basesink=<optimized out>, buffer=0x7fffa00b4990) at gstnicesink.c:331
#7  0x00007ffff282d1b2 in gst_base_sink_chain_unlocked (basesink=basesink@entry=0x7fffbc8d7420 [GstNiceSink], obj=obj@entry=0x7fffa00b4990, is
_list=is_list@entry=0, pad=<optimized out>) at gstbasesink.c:3532
#8  0x00007ffff282e620 in gst_base_sink_chain_main (basesink=0x7fffbc8d7420 [GstNiceSink], pad=<optimized out>, obj=0x7fffa00b4990, is_list=0)
 at gstbasesink.c:3655
#9  0x00007ffff64b45cf in gst_pad_push_data (data=0x7fffa00b4990, type=4112, pad=0x7fffd0a47470 [GstPad]) at gstpad.c:4183
#10 0x00007ffff64b45cf in gst_pad_push_data (pad=pad@entry=0x7fffc889bb20 [GstGhostPad], type=type@entry=4112, data=data@entry=0x7fffa00b4990)
 at gstpad.c:4435
#11 0x00007ffff64bc533 in gst_pad_push (pad=pad@entry=0x7fffc889bb20 [GstGhostPad], buffer=buffer@entry=0x7fffa00b4990) at gstpad.c:4554
#12 0x00007ffff64a55e3 in gst_proxy_pad_chain_default (pad=0x7fffd0212c60 [GstProxyPad], parent=<optimized out>, buffer=0x7fffa00b4990)
    at gstghostpad.c:126
#13 0x00007ffff64b45cf in gst_pad_push_data (data=0x7fffa00b4990, type=4112, pad=0x7fffd0212c60 [GstProxyPad]) at gstpad.c:4183
#14 0x00007ffff64b45cf in gst_pad_push_data (pad=pad@entry=0x7fffd0a476b0 [GstPad], type=type@entry=4112, data=data@entry=0x7fffa00b4990)
    at gstpad.c:4435
#15 0x00007ffff64bc533 in gst_pad_push (pad=0x7fffd0a476b0 [GstPad], buffer=0x7fffa00b4990) at gstpad.c:4554
#16 0x00007fffc5c1248d in gst_funnel_sink_chain_object (pad=0x7fffd4400e40 [GstFunnelPad], funnel=0x7fff4c01d7e0 [GstFunnel], is_list=0, obj=0
x7fffa00b4990) at gstfunnel.c:452
#17 0x00007ffff64b45cf in gst_pad_push_data (data=0x7fffa00b4990, type=4112, pad=0x7fffd4400e40 [GstFunnelPad]) at gstpad.c:4183
#18 0x00007ffff64b45cf in gst_pad_push_data (pad=pad@entry=0x7fffd0a4cdf0 [GstPad], type=type@entry=4112, data=data@entry=0x7fffa00b4990)
    at gstpad.c:4435
#19 0x00007ffff64bc533 in gst_pad_push (pad=0x7fffd0a4cdf0 [GstPad], buffer=buffer@entry=0x7fffa00b4990) at gstpad.c:4554
#20 0x00007fffc6275531 in gst_srtp_enc_chain (pad=0x7fffb82de950 [GstPad], parent=0x7fff700d6d40 [GstSrtpEnc], buf=0x7fff78084480, is_rtcp=<optimized out>) at gstsrtpenc.c:1095
---Type <return> to continue, or q <return> to quit---
#21 0x00007ffff64b45cf in gst_pad_push_data (data=0x7fff78084480, type=4112, pad=0x7fffb82de950 [GstPad]) at gstpad.c:4183
#22 0x00007ffff64b45cf in gst_pad_push_data (pad=pad@entry=0x7fffd02137f0 [GstProxyPad], type=type@entry=4112, data=data@entry=0x7fff78084480) at gstpad.c:4435
#23 0x00007ffff64bc533 in gst_pad_push (pad=pad@entry=0x7fffd02137f0 [GstProxyPad], buffer=buffer@entry=0x7fff78084480) at gstpad.c:4554
#24 0x00007ffff64a55e3 in gst_proxy_pad_chain_default (pad=0x7fff7c2f1d90 [GstGhostPad], parent=<optimized out>, buffer=0x7fff78084480)
    at gstghostpad.c:126
#25 0x00007ffff64b45cf in gst_pad_push_data (data=0x7fff78084480, type=4112, pad=0x7fff7c2f1d90 [GstGhostPad]) at gstpad.c:4183
#26 0x00007ffff64b45cf in gst_pad_push_data (pad=pad@entry=0x7fff7c2f0c80 [GstGhostPad], type=type@entry=4112, data=data@entry=0x7fff78084480) at gstpad.c:4435
#27 0x00007ffff64bc533 in gst_pad_push (pad=pad@entry=0x7fff7c2f0c80 [GstGhostPad], buffer=buffer@entry=0x7fff78084480) at gstpad.c:4554
#28 0x00007ffff64a55e3 in gst_proxy_pad_chain_default (pad=0x7fffd44012f0 [GstProxyPad], parent=<optimized out>, buffer=0x7fff78084480)
    at gstghostpad.c:126
#29 0x00007ffff64b45cf in gst_pad_push_data (data=0x7fff78084480, type=4112, pad=0x7fffd44012f0 [GstProxyPad]) at gstpad.c:4183
#30 0x00007ffff64b45cf in gst_pad_push_data (pad=pad@entry=0x7fffc8902070 [GstPad], type=type@entry=4112, data=data@entry=0x7fff78084480)
    at gstpad.c:4435
#31 0x00007ffff64bc533 in gst_pad_push (pad=pad@entry=0x7fffc8902070 [GstPad], buffer=buffer@entry=0x7fff78084480) at gstpad.c:4554
#32 0x00007fffc75e6505 in gst_rtp_session_send_rtcp (sess=<optimized out>, src=<optimized out>, buffer=0x7fff78084480, eos=0, user_data=0x7fff7c02d750) at gstrtpsession.c:1458
#33 0x00007fffc75dc604 in rtp_session_on_timeout (sess=sess@entry=0x7fff240e1200 [RTPSession], current_time=current_time@entry=1913450824379, ntpnstime=<optimized out>, running_time=<optimized out>) at rtpsession.c:4046
#34 0x00007fffc75e5489 in rtcp_thread (rtpsession=0x7fff7c02d750 [GstRtpSession]) at gstrtpsession.c:1169
#35 0x00007ffff4546c55 in g_thread_proxy (data=0x7fffc86fd190) at /build/glib2.0-7ZsPUq/glib2.0-2.48.2/./glib/gthread.c:780
#36 0x00007ffff70286ba in start_thread (arg=0x7ffec176a700) at pthread_create.c:333
#37 0x00007ffff4cfa41d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
(gdb) 

Thanks & Regards Puneet SIngh

jmaiquez commented 5 years ago

Hi Juan,

Do you still need help reproducing this issue? Or do you now have enough information- either from Puneet's sample script (thanks Puneet!) or your own reproduction testing?

FWIW, we spent a day last week testing TURN connections and then opening/closing various ports in the firewall to try to simulate a corporate network user, but we did not manage to reproduce any crashes.

This morning, we saw several crashes while trying to configure TURN TCP:443 for a specific customer. The same steps did not crash for another customer, and we did not have the time to retest given that it was close to their business hours.

This week we will likely be doing a lot of firefighting with the customers that we have already migrated to our KMS-based product (from our legacy Adobe Media Server-based product).

I'm not sure how much longer we can do this before these customers start looking for alternatives. I don't say this with malice- it's just our reality right now. So if there is anything we can still do from our end, please let me know.

Best regards, Jorge

jmaiquez commented 5 years ago

Hi Juan,

I put together a risk analysis for our production environment, and it boils down to the following:

  1. load-independent crashes These are caused by a specific trigger- likely a user configuration. This is by far the biggest problem we face because we cannot reliably recover from it. Even with detect & correct logic (detect the crash, move the conference to a new KMS VM, bring back the crashed VM), this type of user will simply repeat the offending action (e.g. start broadcast) and crash every KMS VM in our cloud- one by one.

  2. load-dependent crashes This crash occurs when KMS serves X streams. Assuming that we can determine this number approximately, as a short term solution, we could prevent these crashes by load balancing accordingly. If this X is a low number, then this solution is not good for the longer term as it significantly increases infrastructure complexity and cost.

  3. intermittent crashes Crashes that don't seem to be related to a higher load limit or a specific user configuration. One session will work just fine with 50 viewers and 2 broadcasters, but the next session with the same characteristics will crash randomly 30 minutes into the session. If the crashes occur infrequently, then we could get away with the detect & correct logic discussed in point 1. We could probably make that work with little noticeable disruption to our customers.

I believe your test falls under scenario 1- perhaps not as reproducible as we have seen with a specific user of ours- but I don't think it's as infrequent as scenario 3. Puneet's tests are surely scenario 1.

Do you have any more progress from your end? If we can get past scenarios 1 and 2, then that buys us time.

Best regards, Jorge

puneet89 commented 5 years ago

Hi Everyone,

We are running a version of kurento media server (6.7.1) which we have installed using deb packages. Requesting one input here that what libnice version comes along or compatible with this kurento media server package.?(git source code path)
Alternatively if someone can tells the libnice version for kurento media server version (6.7.1/6.7.2) Need to download the source code of compatible libnice. Appreciate quick response here.

Regards Puneet Singh

jmaiquez commented 5 years ago

I can't speak with certainty as to what the latest stable libnice version is for KMS 6.7.1, but from an older internal document, I can see that we were running libnice 0.1.13-3 with KMS 6.7.1.

We are now running KMS 6.7.2\~19.g181284d + 0.1.13.1.xenial\~20170725160546.81.eebfdab

jmaiquez commented 5 years ago

Juan, we just experienced a crash with a customer who was doing a 30 user webinar. The crash happened at the very end, when people were leaving. I have attached the log.

The interesting thing here is that there are state change warnings and then a socket crash at the end.

(kurento-media-server:6932): libnice-WARNING **: (agent.c:2156):agent_signal_component_state_change: runtime check failed ... the above repeats 28 times Segmentation fault (thread 140162029483776, pid 6932) Stack trace: [g_socket_send_message]

The state change warning is there 28 times, which could coincide with the amount of users that were in the session at that time. One of the people leaving could have triggered that for the remaining 28?

From these logs, it looks like the 2 bugs are related- as Puneet's tests already gave evidence of. errors-today.log

jmaiquez commented 5 years ago

Hi Juan,

I'm not sure if your silence the last 10 days is a good sign or a bad sign.

Could you please let us know if there are any updates from your end?

We are now just in a holding pattern of crisis management with our customers. If there is anything else we can do, please let us know.

Thanks, Jorge

puneet89 commented 5 years ago

Hi Jorge, During analysis we found that issue was there while sending a data on deleted passive socket.

while back-tracing this we found that this passive socket got deleted for some reasons (component_update_selected_pair in our case), but we did not remove its entry from hash table of its peer socket (server socket). Now later when we are about to send data in socket_send_messages function we try to get hold of peer socket from hash table of active socket as below

TcpPassivePriv *priv = sock->priv;

if (to) { NiceSocket *peer_socket = g_hash_table_lookup (priv->connections, to); if (peer_socket) return nice_socket_send_messages (peer_socket, to, messages, n_messages); } return -1

Here lookup would return success but the contents of this peer_socket memory address would be junk values as we have already deleted the same because of reason specified above.

this entry we added during creation of new socket in nice_tcp_passive_socket_accept function if (new_socket) { NiceAddress *key = nice_address_dup (&remote_addr);

nice_socket_set_writable_callback (new_socket, _child_writable_cb, sock);
g_hash_table_insert (priv->connections, key, new_socket);

}

In fact we have never removed the entry from hash table even if socket got deleted for any reason. Now in order to fix this, i have added a function to remove the entry from hash table whenever we delete a socket. this function is getting called from nice_socket_free.

after this fix i have tried multiple iterations of my test script and it is working fine without any issues. i will submit a formal patch for this in libnice.

Thanks & Regards Puneet Singh

micaelgallego commented 5 years ago

Hi all,

We are in the middle of Kurento release 6.8.0. We plan to work on this tomorrow or the day after.

jmaiquez commented 5 years ago

Hi Micael,

I mean no offense- and I suppose you will have your reasons, but I don't understand why 6.8.0 gets priority over a bug that makes kurento unusable in a production environment. It would seem to me that the latter is more critical- not just for us, but for a lot of organizations that have taken kurento into production.

Best regards, Jorge

jmaiquez commented 5 years ago

Hi Puneet,

That sounds really promising! Would you be willing to share your patched version of libnice with us?

Do you need any help from us in any way?

Thanks & all the best, Jorge

micaelgallego commented 5 years ago

@jmaiquez we need to publish a new release because we have some changes needed by a project.

We will take look to this libnice issue immediately after the release

jmaiquez commented 5 years ago

@micaelgallego understood. Not my place to question where your focus should be- I was curious. But out of interest, does this project (assuming it's commercial) not suffer from these crashes?

puneet89 commented 5 years ago

Hi Jorge, I have raised a pull request for same with libnice ( https://github.com/Kurento/libnice/pull/3) . In the mean time you can use following branch to test your scenarios : https://github.com/puneet89/libnice/tree/patch_gsocket_fix1 ( KMS version 6.7.2 deb pkg , libnice patch created from 0.1.14 )

Let me know in case of any issue.

Thanks & Regards Puneet Singh