awslabs / amazon-kinesis-video-streams-webrtc-sdk-c

Amazon Kinesis Video Streams Webrtc SDK is for developers to install and customize realtime communication between devices and enable secure streaming of video, audio to Kinesis Video Streams.
https://awslabs.github.io/amazon-kinesis-video-streams-webrtc-sdk-c/group__PublicMemberFunctions.html
Apache License 2.0
1.03k stars 310 forks source link

[QUESTION] core dump in refreshIceConfigurationCallback #870

Closed staowyze closed 3 years ago

staowyze commented 3 years ago

Hi experts,

Got a core dump as follows, v1.30 tag, any comments are appreciated!

Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x77b79db8 in ?? ()
(gdb) set solib-search-path /opt/nfs/lib/:/opt/nfs/webrtc/
(gdb) bt
#0  0x77b79db8 in __lws_close_free_wsi (wsi=0xc2e570, reason=LWS_CLOSE_STATUS_NOSTATUS, caller=<optimized out>)
    at /home/admin/workspace/amazon-kinesis-video-streams-webrtc-sdk-c-mirror/open-source/libwebsockets/build/src/project_libwebsockets/lib/core-net/close.c:342
#1  0x77b8e04c in lws_client_connect_2 (wsi=0xc2e570)
    at /home/admin/workspace/amazon-kinesis-video-streams-webrtc-sdk-c-mirror/open-source/libwebsockets/build/src/project_libwebsockets/lib/roles/http/client/client-handshake.c:768
#2  0x77b83cfc in lws_header_table_attach (wsi=0xc2e570, autoservice=0)
    at /home/admin/workspace/amazon-kinesis-video-streams-webrtc-sdk-c-mirror/open-source/libwebsockets/build/src/project_libwebsockets/lib/roles/http/server/parsers.c:322
#3  0x77b85920 in rops_client_bind_h1 (wsi=0xc2e570, i=<optimized out>)
    at /home/admin/workspace/amazon-kinesis-video-streams-webrtc-sdk-c-mirror/open-source/libwebsockets/build/src/project_libwebsockets/lib/roles/h1/ops-h1.c:965
#4  rops_client_bind_h1 (wsi=0xc2e570, i=<optimized out>)
    at /home/admin/workspace/amazon-kinesis-video-streams-webrtc-sdk-c-mirror/open-source/libwebsockets/build/src/project_libwebsockets/lib/roles/h1/ops-h1.c:927
#5  0x77b8bd80 in lws_client_connect_via_info (i=0x74e34e38)
    at /home/admin/workspace/amazon-kinesis-video-streams-webrtc-sdk-c-mirror/open-source/libwebsockets/build/src/project_libwebsockets/lib/core-net/connect.c:257
#6  0x77bbada4 in lwsCompleteSync (pCallInfo=0xc51488)
    at /home/admin/workspace/amazon-kinesis-video-streams-webrtc-sdk-c-mirror/src/source/Signaling/LwsApiCalls.c:533
#7  0x77bbcf88 in getIceConfigLws (pSignalingClient=0xc354a8, time=16026761321316303)
    at /home/admin/workspace/amazon-kinesis-video-streams-webrtc-sdk-c-mirror/src/source/Signaling/LwsApiCalls.c:946
#8  0x77bc5380 in getIceConfig (pSignalingClient=0xc354a8, time=16026761321316303)
    at /home/admin/workspace/amazon-kinesis-video-streams-webrtc-sdk-c-mirror/src/source/Signaling/Signaling.c:1069
#9  0x77bc7638 in executeGetIceConfigSignalingState (customData=12801192, time=16026761321316303)
    at /home/admin/workspace/amazon-kinesis-video-streams-webrtc-sdk-c-mirror/src/source/Signaling/StateMachine.c:557
#10 0x77c43db4 in stepStateMachine () from ./libkvsWebrtcClient.so
#11 0x77bc6164 in stepSignalingStateMachine (pSignalingClient=0xc354a8, status=0)
    at /home/admin/workspace/amazon-kinesis-video-streams-webrtc-sdk-c-mirror/src/source/Signaling/StateMachine.c:97
#12 0x77bc359c in refreshIceConfigurationCallback (timerId=0, scheduledTime=16026761321316063, customData=12801192)
    at /home/admin/workspace/amazon-kinesis-video-streams-webrtc-sdk-c-mirror/src/source/Signaling/Signaling.c:628
#13 0x77c42034 in timerQueueExecutor () from ./libkvsWebrtcClient.so
warning: GDB can't find the start of the function at 0x77a726de.

    GDB is unable to find the start of the function at 0x77a726de
and thus can't determine the size of that function's stack frame.
This means that GDB may be unable to access that stack frame, or
the frames below it.
    This problem is most likely caused by an invalid program counter or
stack pointer.
    However, if you think GDB should simply search farther back
from 0x77a726de for code which looks like the beginning of a
function, you can increase the range of the search using the `set
heuristic-fence-post' command.
#14 0x77a726e0 in ?? () from /opt/nfs/lib/libpthread-0.9.33.2.so
MushMal commented 3 years ago

This is a really odd one. I haven't seen any crash stacks like this. Perhaps there is a memory corruption already in place somewhere as the stack points to a fairly innocuous lines of code in libWebSockets.

Any debug logs? What's the easiest way to go with these crashes? Can we update some of the devices to the latest to eliminate a few of the issues that were fixed post v1.3 release?

staowyze commented 3 years ago

I could definitely try if a new release is out

staowyze commented 3 years ago

FYI after porting the #861 fix, now this core dump happens most

MushMal commented 3 years ago

Will look into the ICE refresh shortly. Meanwhile, if you have any further info/data please let us know. I am assuming you are using the latest master.

staowyze commented 3 years ago

not yet, still with v1.30, but i have ported some critical fixes, e.g. the ICE refresh one you mentioned. I just lowered the logging level, hope to get more meaningful log soon.

staowyze commented 3 years ago

here is an similar core dump

#0  0x7740ddb8 in __lws_close_free_wsi (wsi=0x8e5598, reason=LWS_CLOSE_STATUS_NOSTATUS, caller=<optimized out>)
    at /home/admin/workspace/amazon-kinesis-video-streams-webrtc-sdk-c-mirror/open-source/libwebsockets/build/src/project_libwebsockets/lib/core-net/close.c:342
#1  0x7742204c in lws_client_connect_2 (wsi=0x8e5598)
    at /home/admin/workspace/amazon-kinesis-video-streams-webrtc-sdk-c-mirror/open-source/libwebsockets/build/src/project_libwebsockets/lib/roles/http/client/client-handshake.c:768
#2  0x77417cfc in lws_header_table_attach (wsi=0x8e5598, autoservice=0)
    at /home/admin/workspace/amazon-kinesis-video-streams-webrtc-sdk-c-mirror/open-source/libwebsockets/build/src/project_libwebsockets/lib/roles/http/server/parsers.c:322
#3  0x77419920 in rops_client_bind_h1 (wsi=0x8e5598, i=<optimized out>)
    at /home/admin/workspace/amazon-kinesis-video-streams-webrtc-sdk-c-mirror/open-source/libwebsockets/build/src/project_libwebsockets/lib/roles/h1/ops-h1.c:965
#4  rops_client_bind_h1 (wsi=0x8e5598, i=<optimized out>)
    at /home/admin/workspace/amazon-kinesis-video-streams-webrtc-sdk-c-mirror/open-source/libwebsockets/build/src/project_libwebsockets/lib/roles/h1/ops-h1.c:927
#5  0x7741fd80 in lws_client_connect_via_info (i=0x736c8e30)
    at /home/admin/workspace/amazon-kinesis-video-streams-webrtc-sdk-c-mirror/open-source/libwebsockets/build/src/project_libwebsockets/lib/core-net/connect.c:257
#6  0x7744ee34 in lwsCompleteSync () from ./libkvsWebrtcSignalingClient.so
#7  0x77451018 in getIceConfigLws () from ./libkvsWebrtcSignalingClient.so
#8  0x77459090 in getIceConfig () from ./libkvsWebrtcSignalingClient.so
#9  0x7745b378 in executeGetIceConfigSignalingState () from ./libkvsWebrtcSignalingClient.so
#10 0x774d7e14 in stepStateMachine () from ./libkvsWebrtcClient.so
#11 0x77459e8c in stepSignalingStateMachine () from ./libkvsWebrtcSignalingClient.so
#12 0x774572d0 in refreshIceConfigurationCallback () from ./libkvsWebrtcSignalingClient.so
#13 0x774d6094 in timerQueueExecutor () from ./libkvsWebrtcClient.so

and the corresponding info log

[10-16 21:54:20][2020-09-16 13:54:20] [KVS]lwsWssCallbackRoutine(): Client is writable
[10-16 21:54:22][2020-09-16 13:54:22] [KVS]refreshIceConfigurationCallback(): Refreshing the ICE Server Configuration
[10-16 21:54:22][2020-09-16 13:54:22] [KVS]stepStateMachine(): State Machine - Current state: 0x0000000000000100, Next state: 0x0000000000000020
[10-16 21:54:24][2020-09-16 13:54:24] [KVS]lwsHttpCallbackRoutine(): Client append handshake header
[10-16 21:54:24][2020-09-16 13:54:24] [KVS]lwsHttpCallbackRoutine(): Sending the body {
[10-16 21:54:24][2020-09-16 13:54:24] [KVS]lwsHttpCallbackRoutine(): Connected with server response: 200
[10-16 21:54:24][2020-09-16 13:54:24] [KVS]lwsHttpCallbackRoutine(): Received client http
[10-16 21:54:24][2020-09-16 13:54:24] [KVS]lwsHttpCallbackRoutine(): Received client http read: 972 bytes
[10-16 21:54:24][2020-09-16 13:54:24] [KVS]lwsHttpCallbackRoutine(): Http client completed
[10-16 21:54:24][2020-09-16 13:54:24] [KVS]lwsHttpCallbackRoutine(): Client http closed
[10-16 21:54:24][2020-09-16 13:54:24] [KVS]stepStateMachine(): State Machine - Current state: 0x0000000000000020, Next state: 0x0000000000000040
[10-16 21:54:24][2020-09-16 13:54:24] [KVS]stepStateMachine(): State Machine - Current state: 0x0000000000000040, Next state: 0x0000000000000080
[10-16 21:54:24][2020-09-16 13:54:24] [KVS]stepStateMachine(): State Machine - Current state: 0x0000000000000080, Next state: 0x0000000000000100
[10-16 21:54:30][2020-09-16 13:54:30] [KVS]lwsWssCallbackRoutine(): Client is writable
[10-16 21:54:40][2020-09-16 13:54:40] [KVS]lwsWssCallbackRoutine(): Client is writable
[10-16 21:54:50][2020-09-16 13:54:50] [KVS]lwsWssCallbackRoutine(): Client is writable
[10-16 21:55:00][2020-09-16 13:55:00] [KVS]lwsWssCallbackRoutine(): Client is writable
[10-16 21:55:10][2020-09-16 13:55:10] [KVS]lwsWssCallbackRoutine(): Client is writable
[10-16 21:55:20][2020-09-16 13:55:20] [KVS]lwsWssCallbackRoutine(): Client is writable
[10-16 21:55:30][2020-09-16 13:55:30] [KVS]lwsWssCallbackRoutine(): Client is writable
[10-16 21:55:41][2020-09-16 13:55:41] [KVS]lwsWssCallbackRoutine(): Client is writable
[10-16 21:55:51][2020-09-16 13:55:51] [KVS]lwsWssCallbackRoutine(): Client is writable
[10-16 21:56:01][2020-09-16 13:56:01] [KVS]lwsWssCallbackRoutine(): Client is writable
[10-16 21:56:11][2020-09-16 13:56:11] [KVS]lwsWssCallbackRoutine(): Client is writable
[10-16 21:56:21][2020-09-16 13:56:21] [KVS]lwsWssCallbackRoutine(): Client is writable
[10-16 21:56:31][2020-09-16 13:56:31] [KVS]lwsWssCallbackRoutine(): Client is writable
[10-16 21:56:42][2020-09-16 13:56:41] [KVS]lwsWssCallbackRoutine(): Client is writable
[10-16 21:56:52][2020-09-16 13:56:52] [KVS]lwsWssCallbackRoutine(): Client is writable
[10-16 21:57:02][2020-09-16 13:57:02] [KVS]lwsWssCallbackRoutine(): Client is writable
[10-16 21:57:12][2020-09-16 13:57:12] [KVS]lwsWssCallbackRoutine(): Client is writable
[10-16 21:57:22][2020-09-16 13:57:22] [KVS]lwsWssCallbackRoutine(): Client is writable
[10-16 21:57:32][2020-09-16 13:57:32] [KVS]lwsWssCallbackRoutine(): Client is writable
[10-16 21:57:43][2020-09-16 13:57:43] [KVS]lwsWssCallbackRoutine(): Client is writable
[10-16 21:57:53][2020-09-16 13:57:53] [KVS]lwsWssCallbackRoutine(): Client is writable
[10-16 21:58:03][2020-09-16 13:58:03] [KVS]lwsWssCallbackRoutine(): Client is writable
[10-16 21:58:13][2020-09-16 13:58:13] [KVS]lwsWssCallbackRoutine(): Client is writable
[10-16 21:58:23][2020-09-16 13:58:23] [KVS]lwsWssCallbackRoutine(): Client is writable
[10-16 21:58:33][2020-09-16 13:58:33] [KVS]lwsWssCallbackRoutine(): Client is writable
[10-16 21:58:44][2020-09-16 13:58:44] [KVS]lwsWssCallbackRoutine(): Client is writable
[10-16 21:58:54][2020-09-16 13:58:54] [KVS]lwsWssCallbackRoutine(): Client is writable
[10-16 21:58:54][2020-09-16 13:58:54] [KVS]refreshIceConfigurationCallback(): Refreshing the ICE Server Configuration
[10-16 21:58:54][2020-09-16 13:58:54] [KVS]stepStateMachine(): State Machine - Current state: 0x0000000000000100, Next state: 0x0000000000000020
MushMal commented 3 years ago

The last report is something different. This doesn't seem to be related to the termination sequence. I am really not sure what I am looking at - seems to be some sort of a crash (in absence of the registry dump) and the logs indicate a normal execution when refreshing ICE candidate. Is there a way we can take the entire master instead of cherry-picking commits?

Any way we can get more info on this?

staowyze commented 3 years ago

i have collected a few core dumps pointing to the same back trace, regarding the log, I think it just means we don't have enough logging to reveal the problem.

MushMal commented 3 years ago

Are you using a specific libWebSockets version/commit or you are using whatever is in the WebRtc SDK that we define? I believe we had ramped up a version but not since the last release of WebRtc

staowyze commented 3 years ago

i am using the libwebsockets associated with v1.30.

staowyze commented 3 years ago
    GIT_REPOSITORY    https://github.com/warmcat/libwebsockets.git
    GIT_TAG           v3.2.3
MushMal commented 3 years ago

That's the version that we currently use in libWebSockets-CMakeLists.txt

Perhaps there is something that causes libWebSockets to complain - not sure what indeed. Let me do some thinking around this but it would be great to capture some more info on what's going on.

MushMal commented 3 years ago

On thing to note is that we are attempting to refresh ICE configuration while staying connected.

staowyze commented 3 years ago

That's the version that we currently use in libWebSockets-CMakeLists.txt

Perhaps there is something that causes libWebSockets to complain - not sure what indeed. Let me do some thinking around this but it would be great to capture some more info on what's going on.

  • Try to get debug logs
  • Try to run with debug libWebSockets by enabling -DCMAKE_BUILD_TYPE=DEBUG in the CMake file for libWebSockets.

Thank you, will do and share logs once i have them

staowyze commented 3 years ago

Actually, i was using debug libwebsockets, just switch to debug log level.

MushMal commented 3 years ago

Sounds good. I have a theory that I would like to give a shot. Will work in it soon

staowyze commented 3 years ago

Sounds good. I have a theory that I would like to give a shot. Will work in it soon

great! looking forward to seeing it.

staowyze commented 3 years ago

one more crash sample

#0  lws_plat_set_socket_options (vhost=0x0, fd=11, unix_skt=0)
    at /home/admin/workspace/amazon-kinesis-video-streams-webrtc-sdk-c-mirror/open-source/libwebsockets/build/src/project_libwebsockets/lib/plat/unix/unix-sockets.c:96
#1  0x76f2031c in lws_client_connect_2 (wsi=0x661f50)
    at /home/admin/workspace/amazon-kinesis-video-streams-webrtc-sdk-c-mirror/open-source/libwebsockets/build/src/project_libwebsockets/lib/roles/http/client/client-handshake.c:612
#2  0x76f15cfc in lws_header_table_attach (wsi=0x661f50, autoservice=0)
    at /home/admin/workspace/amazon-kinesis-video-streams-webrtc-sdk-c-mirror/open-source/libwebsockets/build/src/project_libwebsockets/lib/roles/http/server/parsers.c:322
#3  0x76f17920 in rops_client_bind_h1 (wsi=0x661f50, i=<optimized out>)
    at /home/admin/workspace/amazon-kinesis-video-streams-webrtc-sdk-c-mirror/open-source/libwebsockets/build/src/project_libwebsockets/lib/roles/h1/ops-h1.c:965
#4  rops_client_bind_h1 (wsi=0x661f50, i=<optimized out>)
    at /home/admin/workspace/amazon-kinesis-video-streams-webrtc-sdk-c-mirror/open-source/libwebsockets/build/src/project_libwebsockets/lib/roles/h1/ops-h1.c:927
#5  0x76f1dd80 in lws_client_connect_via_info (i=0x739c6e30)
    at /home/admin/workspace/amazon-kinesis-video-streams-webrtc-sdk-c-mirror/open-source/libwebsockets/build/src/project_libwebsockets/lib/core-net/connect.c:257
#6  0x76f4ce34 in lwsCompleteSync () from ./libkvsWebrtcSignalingClient.so
#7  0x76f4f018 in getIceConfigLws () from ./libkvsWebrtcSignalingClient.so
#8  0x76f57090 in getIceConfig () from ./libkvsWebrtcSignalingClient.so
#9  0x76f59378 in executeGetIceConfigSignalingState () from ./libkvsWebrtcSignalingClient.so
#10 0x76fd5e14 in stepStateMachine () from ./libkvsWebrtcClient.so
#11 0x76f57e8c in stepSignalingStateMachine () from ./libkvsWebrtcSignalingClient.so
#12 0x76f552d0 in refreshIceConfigurationCallback () from ./libkvsWebrtcSignalingClient.so
#13 0x76fd4094 in timerQueueExecutor () from ./libkvsWebrtcClient.so

debugging log

[10-17 16:56:05][2020-09-17 08:56:05] [KVS]lwsWssCallbackRoutine(): Client is writable
[10-17 16:56:06][2020-09-17 08:56:06] [KVS]lwsWssCallbackRoutine(): WSS callback with reason 9
[10-17 16:56:06][2020-09-17 08:56:06] [KVS]signalingClientGetCurrentState(): Signaling Client Get Current State
[10-17 16:56:07][2020-09-17 08:56:07] [KVS]signalingClientGetCurrentState(): Signaling Client Get Current State
[10-17 16:56:08][2020-09-17 08:56:08] [KVS]signalingClientGetCurrentState(): Signaling Client Get Current State
[10-17 16:56:09][2020-09-17 08:56:09] [KVS]signalingClientGetCurrentState(): Signaling Client Get Current State
[10-17 16:56:10][2020-09-17 08:56:10] [KVS]signalingClientGetCurrentState(): Signaling Client Get Current State
[10-17 16:56:11][2020-09-17 08:56:11] [KVS]signalingClientGetCurrentState(): Signaling Client Get Current State
[10-17 16:56:12][2020-09-17 08:56:12] [KVS]signalingClientGetCurrentState(): Signaling Client Get Current State
[10-17 16:56:13][2020-09-17 08:56:13] [KVS]signalingClientGetCurrentState(): Signaling Client Get Current State
[10-17 16:56:14][2020-09-17 08:56:14] [KVS]signalingClientGetCurrentState(): Signaling Client Get Current State
[10-17 16:56:15][2020-09-17 08:56:15] [KVS]signalingClientGetCurrentState(): Signaling Client Get Current State
[10-17 16:56:16][2020-09-17 08:56:16] [KVS]lwsWssCallbackRoutine(): WSS callback with reason 10
[10-17 16:56:16][2020-09-17 08:56:16] [KVS]lwsWssCallbackRoutine(): Client is writable
[10-17 16:56:16][2020-09-17 08:56:16] [KVS]signalingClientGetCurrentState(): Signaling Client Get Current State
[10-17 16:56:17][2020-09-17 08:56:17] [KVS]signalingClientGetCurrentState(): Signaling Client Get Current State
[10-17 16:56:18][2020-09-17 08:56:18] [KVS]signalingClientGetCurrentState(): Signaling Client Get Current State
[10-17 16:56:18][2020-09-17 08:56:18] [KVS]refreshIceConfigurationCallback(): Refreshing the ICE Server Configuration
[10-17 16:56:18][2020-09-17 08:56:18] [KVS]stepStateMachine(): State Machine - Current state: 0x0000000000000100, Next state: 0x0000000000000020
[10-17 16:56:18][2020-09-17 08:56:18] [KVS]signalingClientStateChanged(): Signaling client state changed to 6 - 'Get ICE Server Configuration'
[10-17 16:56:18][2020-09-17 08:56:18] [KVS]lwsCompleteSync(): Perform secure synchronous call for URL: https://xxxx.kinesisvideo.us-west-2.amazonaws.com/v1/get-ice-server-config
[10-17 16:56:18][2020-09-17 08:56:18] [KVS]lwsHttpCallbackRoutine(): HTTPS callback with reason 85
[10-17 16:56:19][2020-09-17 08:56:19] [KVS]signalingClientGetCurrentState(): Signaling Client Get Current State
[10-17 16:56:20][2020-09-17 08:56:20] [KVS]signalingClientGetCurrentState(): Signaling Client Get Current State
[10-17 16:56:21][2020-09-17 08:56:21] [KVS]lwsWssCallbackRoutine(): WSS callback with reason 9
[10-17 16:56:21][2020-09-17 08:56:21] [KVS]signalingClientGetCurrentState(): Signaling Client Get Current State
[10-17 16:56:22][2020-09-17 08:56:22] [KVS]signalingClientGetCurrentState(): Signaling Client Get Current State
[10-17 16:56:23][2020-09-17 08:56:23] [KVS]signalingClientGetCurrentState(): Signaling Client Get Current State
[10-17 16:56:24][2020-09-17 08:56:24] [KVS]signalingClientGetCurrentState(): Signaling Client Get Current State
[10-17 16:56:25][2020-09-17 08:56:25] [KVS]signalingClientGetCurrentState(): Signaling Client Get Current State
[10-17 16:56:26][2020-09-17 08:56:26] [KVS]signalingClientGetCurrentState(): Signaling Client Get Current State
[10-17 16:56:27][2020-09-17 08:56:27] [KVS]signalingClientGetCurrentState(): Signaling Client Get Current State
[10-17 16:56:28][2020-09-17 08:56:28] [KVS]signalingClientGetCurrentState(): Signaling Client Get Current State
[10-17 16:56:28][2020-09-17 08:56:28] [KVS]lwsHttpCallbackRoutine(): HTTPS callback with reason 30
[10-17 16:56:29][2020-09-17 08:56:29] [KVS]signalingClientGetCurrentState(): Signaling Client Get Current State
staowyze commented 3 years ago

Sounds good. I have a theory that I would like to give a shot. Will work in it soon

Any clue?

staowyze commented 3 years ago

Post bt full in case it may be helpful

bt full
#0  lws_plat_set_socket_options (vhost=0x0, fd=11, unix_skt=0)
    at /home/admin/workspace/amazon-kinesis-video-streams-webrtc-sdk-c-mirror/open-source/libwebsockets/build/src/project_libwebsockets/lib/plat/unix/unix-sockets.c:96
        optval = 1
#1  0x77a2a31c in lws_client_connect_2 (wsi=0xbaf150)
    at /home/admin/workspace/amazon-kinesis-video-streams-webrtc-sdk-c-mirror/open-source/libwebsockets/build/src/project_libwebsockets/lib/roles/http/client/client-handshake.c:612
        context = <optimized out>
        pt = <optimized out>
        adsin = <optimized out>
        plen = 0
        n = <optimized out>
        m = <optimized out>
        port = 443
        cce = 0x77a381a4 ""
        iface = <optimized out>
        psa = <optimized out>
        meth = 0xb9b4b8 "\340\a\273"
        result = 0xbaffb8
        ads = <optimized out>
        sa46 = {sa4 = {sin_family = 2, sin_port = 0, sin_addr = {s_addr = 3027191606}, sin_zero = "\000\000\000\000\000\000\000"}}
#2  0x77a1fcfc in lws_header_table_attach (wsi=0xbaf150, autoservice=0)
    at /home/admin/workspace/amazon-kinesis-video-streams-webrtc-sdk-c-mirror/open-source/libwebsockets/build/src/project_libwebsockets/lib/roles/http/server/parsers.c:322
        context = <optimized out>
        pt = <optimized out>
        pa = {fd = 2007313408, events = 12254160, prev_events = 1}
        n = <optimized out>
        __func__ = "lws_header_table_attach"
#3  0x77a21920 in rops_client_bind_h1 (wsi=0xbaf150, i=<optimized out>)
    at /home/admin/workspace/amazon-kinesis-video-streams-webrtc-sdk-c-mirror/open-source/libwebsockets/build/src/project_libwebsockets/lib/roles/h1/ops-h1.c:965
No locals.
#4  rops_client_bind_h1 (wsi=0xbaf150, i=<optimized out>)
    at /home/admin/workspace/amazon-kinesis-video-streams-webrtc-sdk-c-mirror/open-source/libwebsockets/build/src/project_libwebsockets/lib/roles/h1/ops-h1.c:927
No locals.
#5  0x77a27d80 in lws_client_connect_via_info (i=0x74cd0e30)
    at /home/admin/workspace/amazon-kinesis-video-streams-webrtc-sdk-c-mirror/open-source/libwebsockets/build/src/project_libwebsockets/lib/core-net/connect.c:257
        n = <optimized out>
        wsi = 0xbaf150
        safe = 0x0
        p = <optimized out>
        local = <optimized out>
---Type <return> to continue, or q <return> to quit---
        __func__ = "lws_client_connect_via_info"
#6  0x77a56e34 in lwsCompleteSync (pCallInfo=0xbb6a70)
    at /home/admin/workspace/amazon-kinesis-video-streams-webrtc-sdk-c-mirror/src/source/Signaling/LwsApiCalls.c:545
        retStatus = 0
        retVal = 0
        pHostStart = 0xbb1200 "r-xxxxx.kinesisvideo.us-west-2.amazonaws.com"
        pHostEnd = 0xbb122f ""
        pVerb = 0x77a77500 "POST"
        connectInfo = {context = 0xb91ce8, address = 0xbb1200 "r-xxxx.kinesisvideo.us-west-2.amazonaws.com", port = 443, ssl_connection = 1, 
          path = 0x74cd0ea0 "/v1/get-ice-server-config", host = 0xbb1200 "r-xxxx.kinesisvideo.us-west-2.amazonaws.com", origin = 0x0, 
          protocol = 0x77a77c80 "https", ietf_version_or_minus_one = 0, userdata = 0x0, client_exts = 0x0, method = 0x77a77500 "POST", parent_wsi = 0x0, 
          uri_replace_from = 0x0, uri_replace_to = 0x0, vhost = 0x0, pwsi = 0x74cd0e98, iface = 0x0, local_protocol_name = 0x0, alpn = 0x0, seq = 0x0, 
          opaque_user_data = 0xbb6a70, _unused = {0x0, 0x0, 0x0, 0x0}}
        clientLws = 0x0
        pContext = 0xb91ce8
        secureConnection = 1
        locked = 1
        path = "/v1/get-ice-server-config", '\000' <repeats 9975 times>
        __FUNCTION__ = "lwsCompleteSync"
#7  0x77a59018 in getIceConfigLws (pSignalingClient=0xb2a768, time=16030500265974471)
    at /home/admin/workspace/amazon-kinesis-video-streams-webrtc-sdk-c-mirror/src/source/Signaling/LwsApiCalls.c:958
        __status = 0
        retStatus = 0
        pRequestInfo = 0xbb11e8
        url = "https://r-xxxx.kinesisvideo.us-west-2.amazonaws.com/v1/get-ice-server-config", '\000' <repeats 6712 times>...
        paramsJson = "{\n\t\"ChannelARN\": \"arn:aws:kinesisvideo:us-west-2:xxxx:channel/xxxx/xxxx\",\n\t\"ClientId\": \"ProducerMaster\",\n\t\"Service\": \"TURN\"\n}", '\000' <repeats 470 times>...
        pLwsCallInfo = 0xbb6a70
        pResponseStr = 0x0
        parser = {pos = 0, toknext = 0, toksuper = 0}
        tokens = {{type = JSMN_UNDEFINED, start = 0, end = 0, size = 0} <repeats 27 times>, {type = JSMN_UNDEFINED, start = 1959626544, end = 2006948520, 
            size = 65538}, {type = 1959626680, start = 2006877920, end = 0, size = 2007049808}, {type = 2006074544, start = 1959626576, end = 2006949812, 
            size = 65538}, {type = 1959626680, start = -16843009, end = -2139062144, size = 9}, {type = 18, start = 11, end = 40, size = 26}, {type = 4262560, 
            start = 2007049808, end = 0, size = 0}, {type = 11264008, start = 1603050026, end = 26, size = 40}, {type = 11, start = 18, end = 9, size = 120}, {
            type = JSMN_UNDEFINED, start = 291, end = 0, size = -28800}, {type = 2006843972, start = 842019419, end = 959458608, size = 540553517}, {
            type = 876228913, start = 909261360, end = 1264263261, size = 1935496022}, {type = 1634625385, start = 1735289196, end = 1701407811, size = 1951626350}, {
            type = 1130722401, start = 1735287144, end = 690512997, size = 1767055418}, {type = 1818324583, start = 543649385, end = 1701407843, size = 1931506798}, {
            type = 1702125940, start = 1634231072, end = 1684367214, size = 544175136}, {type = 539828278, start = 1952794407, end = 1162037536, size = 1919243040}, {
            type = 544367990, start = 1718513475, end = 1920296809, size = 1869182049}, {type = 665454, start = 0, end = 0, size = 0}, {type = JSMN_UNDEFINED, 
            start = 0, end = 0, size = 0} <repeats 51 times>, {type = JSMN_UNDEFINED, start = 2008019904, end = 0, size = 1603050026}, {type = 597699133, start = 0, 
            end = 2006074544, size = 2006006764}, {type = JSMN_UNDEFINED, start = 1959627680, end = 2007874568, size = 2008019904}, {type = JSMN_UNDEFINED, 
            start = 1603050026, end = 597700466, size = 0}}
---Type <return> to continue, or q <return> to quit---
        pToken = 0x0
        i = 0
        strLen = 0
        resultLen = 0
        configCount = 0
        tokenCount = 0
        j = 0
        ttl = 2006006764
        jsonInIceServerList = 0
#8  0x77a61090 in getIceConfig (pSignalingClient=0xb2a768, time=16030500265974471)
    at /home/admin/workspace/amazon-kinesis-video-streams-webrtc-sdk-c-mirror/src/source/Signaling/Signaling.c:1060
        retStatus = 0
        timerId = 4366400
#9  0x77a63378 in executeGetIceConfigSignalingState (customData=11708264, time=16030500265974471)
    at /home/admin/workspace/amazon-kinesis-video-streams-webrtc-sdk-c-mirror/src/source/Signaling/StateMachine.c:559
        retStatus = 0
        pSignalingClient = 0xb2a768
#10 0x77adfe14 in stepStateMachine () from ./libkvsWebrtcClient.so
No symbol table info available.
#11 0x77a61e8c in stepSignalingStateMachine (pSignalingClient=0xb2a768, status=0)
    at /home/admin/workspace/amazon-kinesis-video-streams-webrtc-sdk-c-mirror/src/source/Signaling/StateMachine.c:98
        __status = 2006006764
        retStatus = 0
        i = 4262560
        locked = 1
        currentTime = 16030500265974391
        __FUNCTION__ = "stepSignalingStateMachine"
#12 0x77a5f2d0 in refreshIceConfigurationCallback (timerId=0, scheduledTime=16030500265971597, customData=11708264)
    at /home/admin/workspace/amazon-kinesis-video-streams-webrtc-sdk-c-mirror/src/source/Signaling/Signaling.c:621
        __status = 2006107328
        retStatus = 0
        pStateMachineState = 0xb30e78
        pSignalingClient = 0xb2a768
        iceRefreshErrMsg = "\354;\221wH\215\255w0\000\000]0\000\000\000\300\215\315t}\266Y9\300\363\257w\247\363\070\000\000\000\000\000\000\000\000\000\220\215\315t\024\252\245w8J\266\000\001\000\000\000\060\000\000]\300\215\315t0\000\000\000\000\000\000\000p\024\251w\000\000\000\000\017\000\000\000h\247\262\000\017\000\000\000\060\000\000\000Failed to reconnect with status code 0x0000000f.", '\000' <repeats 112 times>...
        iceRefreshErrLen = 0
        newTimerId = 2006074544
        __FUNCTION__ = "refreshIceConfigurationCallback"
#13 0x77ade094 in timerQueueExecutor () from ./libkvsWebrtcClient.so
MushMal commented 3 years ago

Thanks, it does shed more light. I believe I know the root cause and it seems that our understanding of the libWebSocket internal handling of the states is a little off. LibWebSockets entirely assumes a single threaded execution. This leaves us with two choices - either disconnect from the connected websocket when we do ICE refresh and reconnect it once that's done or to spin up two separate contexts. The first solution is more straight forward but at the expense of potentially losing a message or having latency when viewers are connecting. The dual context solution is bad because it doesn't scale down to small footprint devices. So we decided to use the same context to drive synchronous in-flight requests.

Unfortunately, if my theory is correct, it will require some certain reworking of the threading model to handle parallel execution when we still have a connection up and trying to do refresh. This will take a little longer and I don't have yet any ETAs. I will start looking at it tomorrow.

MushMal commented 3 years ago

@staowyze it's hard for us to reproduce the issue you noticed on your devices. Apparently, none of the platforms we run or have seen encountered this issue.

I pushed a new PR: https://github.com/awslabs/amazon-kinesis-video-streams-webrtc-sdk-c/pull/878

This should serialize the connect API as well. Could you give this a try on your device/platform and see if this fixes the crash. The crash stack, while in the guts of the LWS, points to this potential issue that I can think of.

staowyze commented 3 years ago

sure, #878 didn't pass all the tests, still good to patch?

staowyze commented 3 years ago

it passed my tests though, and I just deployed it, let's wait for a day to see if the same crash happens or not.

MushMal commented 3 years ago

I will be looking at the tests. There might be something in the thread sanitizer. Please let us know how this looks on your devices. Also, please take a look at the changes and add your comments.

staowyze commented 3 years ago

Got something different with #878

bt full
#0  0x76ed0c8c in free () from /opt/nfs/lib/libuClibc-0.9.33.2.so
No symbol table info available.
#1  0x76f5ff50 in _realloc (ptr=<optimized out>, size=<optimized out>, reason=<optimized out>)
    at /home/admin/workspace/amazon-kinesis-video-streams-webrtc-sdk-c-mirror/open-source/libwebsockets/build/src/project_libwebsockets/lib/core/alloc.c:120
No locals.
#2  0x76f605c8 in lws_context_destroy3 (context=0x6d4aa0)
    at /home/admin/workspace/amazon-kinesis-video-streams-webrtc-sdk-c-mirror/open-source/libwebsockets/build/src/project_libwebsockets/lib/core/context.c:636
        pcontext_finalize = 0x0
        n = <optimized out>
#3  0x76facb50 in freeSignaling (ppSignalingClient=0x7fb91e88)
    at /home/admin/workspace/amazon-kinesis-video-streams-webrtc-sdk-c-mirror/src/source/Signaling/Signaling.c:206
        retStatus = 0
        pSignalingClient = 0x6ca7e0
        __FUNCTION__ = "freeSignaling"
#4  0x76fa19ac in freeSignalingClient (pSignalingHandle=0x6c4bc8)
    at /home/admin/workspace/amazon-kinesis-video-streams-webrtc-sdk-c-mirror/src/source/Signaling/Client.c:45
        __status = 3732584
        retStatus = 0
        pSignalingClient = 0x6ca7e0
        __FUNCTION__ = "freeSignalingClient"
#5  0x0040747c in ?? ()
MushMal commented 3 years ago

Aha, I know what's going on. Will provide a fix. I didn't change the lock here. Fix in 30 mins

staowyze commented 3 years ago

alright, now most of the problems come from context destroy, i am going to update the patch accordingly.

staowyze commented 3 years ago

crash in freeSignaling is still happening, I realized you have a new commit which I didn't update Resolving race condition with peer connection tear-down. would it cause the crash?

staowyze commented 3 years ago

It would take us a day to verify, due to the crash happens not very often and some core dump uploading latency, is the fix ready for landing? if so, we could do one final verification.

MushMal commented 3 years ago

We are about to land the changes in flight

staowyze commented 3 years ago

i just see a commit in the PR, could you please notify me when the PR is ready? We want to address the bug asap.

MushMal commented 3 years ago

We want to address the issue ASAP too. This is the reason we are expediting these changes. Again, please note that this behavior is only seen on your devices and we were not able to reproduce it as the issue is with a related open source component which is the dependency for the sdk.

At this stage, we are assuming that the lws APIs are properly serialized and that was the root cause of the crash.

MushMal commented 3 years ago

Forgot to mention, as soon as we get a clean run in ci we will push the change and you can then try to upgrade to the latest. Make sure you remove the KVS common library so it can be rebuilt

staowyze commented 3 years ago

Good to know, FYI CI just failed.

MushMal commented 3 years ago

Hmm.. Still building: https://github.com/awslabs/amazon-kinesis-video-streams-webrtc-sdk-c/pull/878/checks?check_run_id=1284214993

I can assure you that we are monitoring this

MushMal commented 3 years ago

@staowyze the change is in the master

staowyze commented 3 years ago

no more crash today, thank you