asterisk / asterisk

The official Asterisk Project repository.
https://www.asterisk.org
Other
1.97k stars 924 forks source link

[bug]: Unable to Cancel Schedule ID ... probably a bug dtls_srtp_stop_timeout_timer, line 2914 #691

Closed joshelson closed 1 month ago

joshelson commented 1 month ago

Severity

Critical

Versions

20.5.0

Components/Modules

res_rtp_asterisk

Operating Environment

Rocky Linux 9

Frequency of Occurrence

Constant

Issue Description

Have recently started seeing a huge uptick in these crashes on Astrisk 20.5. Seems to be related to hangups, at least as far as I can tell, and with WebRTC clients.

This is becoming quite a mess for us with these crashes. Any way to get more helpful logging or debug for this? Log output when the crash happens is below.

core-asterisk-2024-04-04T17-03-45Z-thread1.txt core-asterisk-2024-04-04T17-03-45Z-locks.txt core-asterisk-2024-04-04T17-03-45Z-info.txt core-asterisk-2024-04-04T17-03-45Z-full.txt core-asterisk-2024-04-04T17-03-45Z-brief.txt

Relevant log output

It will crash with a number of messages like this:

[Mar 27 10:42:01] VERBOSE[291316][C-00000093] app_stack.c: PJSIP/2041-00000096 Internal Gosub(agent-channel-hangup-handler,s,1(2041,PJSIP/2041-00000096,)) complete GOSUB_RETVAL=
[Mar 27 10:42:01] WARNING[245277] res_rtp_asterisk.c: Unable to cancel schedule ID 15.  This is probably a bug (res_rtp_asterisk.c: dtls_srtp_stop_timeout_timer, line 2914).
[Mar 27 10:42:01] ERROR[245277] res_rtp_asterisk.c: FRACK!, Failed assertion bad magic number 0x0 for object 0x7f3874114a00 (0)
[Mar 27 10:42:01] ERROR[245277] : Got 25 backtrace records
# 0: [0x5e50d3] asterisk utils.c:2831 __ast_assert_failed()
# 1: [0x464952] asterisk astobj2.c:220 log_bad_ao2()
# 2: [0x4649b8] asterisk astobj2.c:224 __ao2_lock()
# 3: [0x7f38ed0b8c53] res_rtp_asterisk.so res_rtp_asterisk.c:9181 ast_rtp_stop()
# 4: [0x567cf0] asterisk rtp_engine.c:2201 ast_rtp_instance_stop()
# 5: [0x7f38ec501b8f] res_pjsip_sdp_rtp.so res_pjsip_sdp_rtp.c:2292 stream_stop()
# 6: [0x7f38ec501bb8] res_pjsip_sdp_rtp.so res_pjsip_sdp_rtp.c:2300 stream_destroy()
# 7: [0x7f38ed109f21] res_pjsip_session.so res_pjsip_session.c:462 stream_destroy()
# 8: [0x4672ad] asterisk astobj2_container.c:328 internal_ao2_traverse()
# 9: [0x46755c] asterisk astobj2_container.c:414 __ao2_callback()
#10: [0x7f38ed109f8b] res_pjsip_session.so res_pjsip_session.c:478 session_media_dtor()
#11: [0x4657f8] asterisk astobj2.c:624 __ao2_ref()
#12: [0x465ae7] asterisk astobj2.c:675 __ao2_cleanup_debug()
#13: [0x7f38ed109484] res_pjsip_session.so res_pjsip_session.c:286 ast_sip_session_media_state_reset()
#14: [0x7f38ed11c005] res_pjsip_session.so res_pjsip_session.c:3477 ast_sip_session_terminate()
#15: [0x7f38ec60cd50] chan_pjsip.so chan_pjsip.c:2548 hangup()
#16: [0x5ce867] asterisk taskprocessor.c:1304 ast_taskprocessor_execute()
#17: [0x5d82bd] asterisk threadpool.c:1350 execute_tasks()
#18: [0x5ce867] asterisk taskprocessor.c:1304 ast_taskprocessor_execute()
#19: [0x5d60c0] asterisk threadpool.c:367 threadpool_execute()
#20: [0x5d7b7b] asterisk threadpool.c:1137 worker_active()
#21: [0x5d7912] asterisk threadpool.c:1057 worker_start()
#22: [0x5e2297] asterisk utils.c:1607 dummy_start()
#23: [0x7f38ef08b802] libc.so.6 :0 pthread_condattr_setpshared()
#24: [0x7f38ef02b450] libc.so.6 :0 personality()

[Mar 27 10:42:01] ERROR[245277] rtp_engine.c: FRACK!, Failed assertion bad magic number 0x0 for object 0x7f3874114a00 (0)
[Mar 27 10:42:01] ERROR[245277] : Got 26 backtrace records
# 0: [0x5e50d3] asterisk utils.c:2831 __ast_assert_failed()
# 1: [0x464952] asterisk astobj2.c:220 log_bad_ao2()
# 2: [0x4649b8] asterisk astobj2.c:224 __ao2_lock()
# 3: [0x56222e] asterisk rtp_engine.c:642 ast_rtp_instance_set_requested_target_address()
# 4: [0x7f38ed0b90b8] res_rtp_asterisk.so res_rtp_asterisk.c:9213 ast_rtp_stop()
# 5: [0x567cf0] asterisk rtp_engine.c:2201 ast_rtp_instance_stop()
# 6: [0x7f38ec501b8f] res_pjsip_sdp_rtp.so res_pjsip_sdp_rtp.c:2292 stream_stop()
# 7: [0x7f38ec501bb8] res_pjsip_sdp_rtp.so res_pjsip_sdp_rtp.c:2300 stream_destroy()
# 8: [0x7f38ed109f21] res_pjsip_session.so res_pjsip_session.c:462 stream_destroy()
# 9: [0x4672ad] asterisk astobj2_container.c:328 internal_ao2_traverse()
#10: [0x46755c] asterisk astobj2_container.c:414 __ao2_callback()
#11: [0x7f38ed109f8b] res_pjsip_session.so res_pjsip_session.c:478 session_media_dtor()
#12: [0x4657f8] asterisk astobj2.c:624 __ao2_ref()
#13: [0x465ae7] asterisk astobj2.c:675 __ao2_cleanup_debug()
#14: [0x7f38ed109484] res_pjsip_session.so res_pjsip_session.c:286 ast_sip_session_media_state_reset()
#15: [0x7f38ed11c005] res_pjsip_session.so res_pjsip_session.c:3477 ast_sip_session_terminate()
#16: [0x7f38ec60cd50] chan_pjsip.so chan_pjsip.c:2548 hangup()
#17: [0x5ce867] asterisk taskprocessor.c:1304 ast_taskprocessor_execute()
#18: [0x5d82bd] asterisk threadpool.c:1350 execute_tasks()
#19: [0x5ce867] asterisk taskprocessor.c:1304 ast_taskprocessor_execute()
#20: [0x5d60c0] asterisk threadpool.c:367 threadpool_execute()
#21: [0x5d7b7b] asterisk threadpool.c:1137 worker_active()
#22: [0x5d7912] asterisk threadpool.c:1057 worker_start()
#23: [0x5e2297] asterisk utils.c:1607 dummy_start()
#24: [0x7f38ef08b802] libc.so.6 :0 pthread_condattr_setpshared()
#25: [0x7f38ef02b450] libc.so.6 :0 personality()

[Mar 27 10:42:01] ERROR[245277] rtp_engine.c: FRACK!, Failed assertion bad magic number 0x0 for object 0x7f3874114a00 (0)
[Mar 27 10:42:01] ERROR[245277] : Got 29 backtrace records
# 0: [0x5e50d3] asterisk utils.c:2831 __ast_assert_failed()
# 1: [0x464952] asterisk astobj2.c:220 log_bad_ao2()
# 2: [0x4649b8] asterisk astobj2.c:224 __ao2_lock()
# 3: [0x562373] asterisk rtp_engine.c:668 ast_rtp_instance_get_local_address()
# 4: [0x7f38ed0b7450] res_rtp_asterisk.so res_rtp_asterisk.c:8902 ast_rtp_remote_address_set()
# 5: [0x562182] asterisk rtp_engine.c:625 rtp_instance_set_incoming_source_address_nolock()
# 6: [0x56225b] asterisk rtp_engine.c:645 ast_rtp_instance_set_requested_target_address()
# 7: [0x7f38ed0b90b8] res_rtp_asterisk.so res_rtp_asterisk.c:9213 ast_rtp_stop()
# 8: [0x567cf0] asterisk rtp_engine.c:2201 ast_rtp_instance_stop()
# 9: [0x7f38ec501b8f] res_pjsip_sdp_rtp.so res_pjsip_sdp_rtp.c:2292 stream_stop()
#10: [0x7f38ec501bb8] res_pjsip_sdp_rtp.so res_pjsip_sdp_rtp.c:2300 stream_destroy()
#11: [0x7f38ed109f21] res_pjsip_session.so res_pjsip_session.c:462 stream_destroy()
#12: [0x4672ad] asterisk astobj2_container.c:328 internal_ao2_traverse()
#13: [0x46755c] asterisk astobj2_container.c:414 __ao2_callback()
#14: [0x7f38ed109f8b] res_pjsip_session.so res_pjsip_session.c:478 session_media_dtor()
#15: [0x4657f8] asterisk astobj2.c:624 __ao2_ref()
#16: [0x465ae7] asterisk astobj2.c:675 __ao2_cleanup_debug()
#17: [0x7f38ed109484] res_pjsip_session.so res_pjsip_session.c:286 ast_sip_session_media_state_reset()
#18: [0x7f38ed11c005] res_pjsip_session.so res_pjsip_session.c:3477 ast_sip_session_terminate()
#19: [0x7f38ec60cd50] chan_pjsip.so chan_pjsip.c:2548 hangup()
#20: [0x5ce867] asterisk taskprocessor.c:1304 ast_taskprocessor_execute()
#21: [0x5d82bd] asterisk threadpool.c:1350 execute_tasks()
#22: [0x5ce867] asterisk taskprocessor.c:1304 ast_taskprocessor_execute()
#23: [0x5d60c0] asterisk threadpool.c:367 threadpool_execute()
#24: [0x5d7b7b] asterisk threadpool.c:1137 worker_active()
#25: [0x5d7912] asterisk threadpool.c:1057 worker_start()
#26: [0x5e2297] asterisk utils.c:1607 dummy_start()
#27: [0x7f38ef08b802] libc.so.6 :0 pthread_condattr_setpshared()
#28: [0x7f38ef02b450] libc.so.6 :0 personality()

[Mar 27 10:42:01] ERROR[245277] rtp_engine.c: FRACK!, Failed assertion bad magic number 0x0 for object 0x7f3874114a00 (0)
[Mar 27 10:42:01] ERROR[245277] : Got 29 backtrace records
# 0: [0x5e50d3] asterisk utils.c:2831 __ast_assert_failed()
# 1: [0x464952] asterisk astobj2.c:220 log_bad_ao2()
# 2: [0x464bf0] asterisk astobj2.c:290 __ao2_unlock()
# 3: [0x5623ab] asterisk rtp_engine.c:670 ast_rtp_instance_get_local_address()
# 4: [0x7f38ed0b7450] res_rtp_asterisk.so res_rtp_asterisk.c:8902 ast_rtp_remote_address_set()
# 5: [0x562182] asterisk rtp_engine.c:625 rtp_instance_set_incoming_source_address_nolock()
# 6: [0x56225b] asterisk rtp_engine.c:645 ast_rtp_instance_set_requested_target_address()
# 7: [0x7f38ed0b90b8] res_rtp_asterisk.so res_rtp_asterisk.c:9213 ast_rtp_stop()
# 8: [0x567cf0] asterisk rtp_engine.c:2201 ast_rtp_instance_stop()
# 9: [0x7f38ec501b8f] res_pjsip_sdp_rtp.so res_pjsip_sdp_rtp.c:2292 stream_stop()
#10: [0x7f38ec501bb8] res_pjsip_sdp_rtp.so res_pjsip_sdp_rtp.c:2300 stream_destroy()
#11: [0x7f38ed109f21] res_pjsip_session.so res_pjsip_session.c:462 stream_destroy()
#12: [0x4672ad] asterisk astobj2_container.c:328 internal_ao2_traverse()
#13: [0x46755c] asterisk astobj2_container.c:414 __ao2_callback()
#14: [0x7f38ed109f8b] res_pjsip_session.so res_pjsip_session.c:478 session_media_dtor()
#15: [0x4657f8] asterisk astobj2.c:624 __ao2_ref()
#16: [0x465ae7] asterisk astobj2.c:675 __ao2_cleanup_debug()
#17: [0x7f38ed109484] res_pjsip_session.so res_pjsip_session.c:286 ast_sip_session_media_state_reset()
#18: [0x7f38ed11c005] res_pjsip_session.so res_pjsip_session.c:3477 ast_sip_session_terminate()
#19: [0x7f38ec60cd50] chan_pjsip.so chan_pjsip.c:2548 hangup()
#20: [0x5ce867] asterisk taskprocessor.c:1304 ast_taskprocessor_execute()
#21: [0x5d82bd] asterisk threadpool.c:1350 execute_tasks()
#22: [0x5ce867] asterisk taskprocessor.c:1304 ast_taskprocessor_execute()
#23: [0x5d60c0] asterisk threadpool.c:367 threadpool_execute()
#24: [0x5d7b7b] asterisk threadpool.c:1137 worker_active()
#25: [0x5d7912] asterisk threadpool.c:1057 worker_start()
#26: [0x5e2297] asterisk utils.c:1607 dummy_start()
#27: [0x7f38ef08b802] libc.so.6 :0 pthread_condattr_setpshared()
#28: [0x7f38ef02b450] libc.so.6 :0 personality()

[Mar 27 10:42:01] ERROR[245277] rtp_engine.c: FRACK!, Failed assertion bad magic number 0x0 for object 0x7f3874114a00 (0)
[Mar 27 10:42:01] ERROR[245277] : Got 26 backtrace records
# 0: [0x5e50d3] asterisk utils.c:2831 __ast_assert_failed()
# 1: [0x464952] asterisk astobj2.c:220 log_bad_ao2()
# 2: [0x464bf0] asterisk astobj2.c:290 __ao2_unlock()
# 3: [0x56227c] asterisk rtp_engine.c:647 ast_rtp_instance_set_requested_target_address()
# 4: [0x7f38ed0b90b8] res_rtp_asterisk.so res_rtp_asterisk.c:9213 ast_rtp_stop()
# 5: [0x567cf0] asterisk rtp_engine.c:2201 ast_rtp_instance_stop()
# 6: [0x7f38ec501b8f] res_pjsip_sdp_rtp.so res_pjsip_sdp_rtp.c:2292 stream_stop()
# 7: [0x7f38ec501bb8] res_pjsip_sdp_rtp.so res_pjsip_sdp_rtp.c:2300 stream_destroy()
# 8: [0x7f38ed109f21] res_pjsip_session.so res_pjsip_session.c:462 stream_destroy()
# 9: [0x4672ad] asterisk astobj2_container.c:328 internal_ao2_traverse()
#10: [0x46755c] asterisk astobj2_container.c:414 __ao2_callback()
#11: [0x7f38ed109f8b] res_pjsip_session.so res_pjsip_session.c:478 session_media_dtor()
#12: [0x4657f8] asterisk astobj2.c:624 __ao2_ref()
#13: [0x465ae7] asterisk astobj2.c:675 __ao2_cleanup_debug()
#14: [0x7f38ed109484] res_pjsip_session.so res_pjsip_session.c:286 ast_sip_session_media_state_reset()
#15: [0x7f38ed11c005] res_pjsip_session.so res_pjsip_session.c:3477 ast_sip_session_terminate()
#16: [0x7f38ec60cd50] chan_pjsip.so chan_pjsip.c:2548 hangup()
#17: [0x5ce867] asterisk taskprocessor.c:1304 ast_taskprocessor_execute()
#18: [0x5d82bd] asterisk threadpool.c:1350 execute_tasks()
#19: [0x5ce867] asterisk taskprocessor.c:1304 ast_taskprocessor_execute()
#20: [0x5d60c0] asterisk threadpool.c:367 threadpool_execute()
#21: [0x5d7b7b] asterisk threadpool.c:1137 worker_active()
#22: [0x5d7912] asterisk threadpool.c:1057 worker_start()
#23: [0x5e2297] asterisk utils.c:1607 dummy_start()
#24: [0x7f38ef08b802] libc.so.6 :0 pthread_condattr_setpshared()
#25: [0x7f38ef02b450] libc.so.6 :0 personality()

[Mar 27 10:42:01] ERROR[245277] rtp_engine.c: FRACK!, Failed assertion bad magic number 0x0 for object 0x7f3874114a00 (0)
[Mar 27 10:42:01] ERROR[245277] : Got 24 backtrace records
# 0: [0x5e50d3] asterisk utils.c:2831 __ast_assert_failed()
# 1: [0x464952] asterisk astobj2.c:220 log_bad_ao2()
# 2: [0x464bf0] asterisk astobj2.c:290 __ao2_unlock()
# 3: [0x567d11] asterisk rtp_engine.c:2203 ast_rtp_instance_stop()
# 4: [0x7f38ec501b8f] res_pjsip_sdp_rtp.so res_pjsip_sdp_rtp.c:2292 stream_stop()
# 5: [0x7f38ec501bb8] res_pjsip_sdp_rtp.so res_pjsip_sdp_rtp.c:2300 stream_destroy()
# 6: [0x7f38ed109f21] res_pjsip_session.so res_pjsip_session.c:462 stream_destroy()
# 7: [0x4672ad] asterisk astobj2_container.c:328 internal_ao2_traverse()
# 8: [0x46755c] asterisk astobj2_container.c:414 __ao2_callback()
# 9: [0x7f38ed109f8b] res_pjsip_session.so res_pjsip_session.c:478 session_media_dtor()
#10: [0x4657f8] asterisk astobj2.c:624 __ao2_ref()
#11: [0x465ae7] asterisk astobj2.c:675 __ao2_cleanup_debug()
#12: [0x7f38ed109484] res_pjsip_session.so res_pjsip_session.c:286 ast_sip_session_media_state_reset()
#13: [0x7f38ed11c005] res_pjsip_session.so res_pjsip_session.c:3477 ast_sip_session_terminate()
#14: [0x7f38ec60cd50] chan_pjsip.so chan_pjsip.c:2548 hangup()
#15: [0x5ce867] asterisk taskprocessor.c:1304 ast_taskprocessor_execute()
#16: [0x5d82bd] asterisk threadpool.c:1350 execute_tasks()
#17: [0x5ce867] asterisk taskprocessor.c:1304 ast_taskprocessor_execute()
#18: [0x5d60c0] asterisk threadpool.c:367 threadpool_execute()
#19: [0x5d7b7b] asterisk threadpool.c:1137 worker_active()
#20: [0x5d7912] asterisk threadpool.c:1057 worker_start()
#21: [0x5e2297] asterisk utils.c:1607 dummy_start()
#22: [0x7f38ef08b802] libc.so.6 :0 pthread_condattr_setpshared()
#23: [0x7f38ef02b450] libc.so.6 :0 personality()

[Mar 27 10:42:01] ERROR[245277] rtp_engine.c: FRACK!, Failed assertion bad magic number 0x0 for object 0x7f3874114a00 (0)
[Mar 27 10:42:01] ERROR[245277] : Got 24 backtrace records
# 0: [0x5e50d3] asterisk utils.c:2831 __ast_assert_failed()
# 1: [0x464952] asterisk astobj2.c:220 log_bad_ao2()
# 2: [0x465298] asterisk astobj2.c:501 __ao2_ref()
# 3: [0x465ae7] asterisk astobj2.c:675 __ao2_cleanup_debug()
# 4: [0x561825] asterisk rtp_engine.c:483 ast_rtp_instance_destroy()
# 5: [0x7f38ec501bc7] res_pjsip_sdp_rtp.so res_pjsip_sdp_rtp.c:2302 stream_destroy()
# 6: [0x7f38ed109f21] res_pjsip_session.so res_pjsip_session.c:462 stream_destroy()
# 7: [0x4672ad] asterisk astobj2_container.c:328 internal_ao2_traverse()
# 8: [0x46755c] asterisk astobj2_container.c:414 __ao2_callback()
# 9: [0x7f38ed109f8b] res_pjsip_session.so res_pjsip_session.c:478 session_media_dtor()
#10: [0x4657f8] asterisk astobj2.c:624 __ao2_ref()
#11: [0x465ae7] asterisk astobj2.c:675 __ao2_cleanup_debug()
#12: [0x7f38ed109484] res_pjsip_session.so res_pjsip_session.c:286 ast_sip_session_media_state_reset()
#13: [0x7f38ed11c005] res_pjsip_session.so res_pjsip_session.c:3477 ast_sip_session_terminate()
#14: [0x7f38ec60cd50] chan_pjsip.so chan_pjsip.c:2548 hangup()
#15: [0x5ce867] asterisk taskprocessor.c:1304 ast_taskprocessor_execute()
#16: [0x5d82bd] asterisk threadpool.c:1350 execute_tasks()
#17: [0x5ce867] asterisk taskprocessor.c:1304 ast_taskprocessor_execute()
#18: [0x5d60c0] asterisk threadpool.c:367 threadpool_execute()
#19: [0x5d7b7b] asterisk threadpool.c:1137 worker_active()
#20: [0x5d7912] asterisk threadpool.c:1057 worker_start()
#21: [0x5e2297] asterisk utils.c:1607 dummy_start()
#22: [0x7f38ef08b802] libc.so.6 :0 pthread_condattr_setpshared()
#23: [0x7f38ef02b450] libc.so.6 :0 personality()

[Mar 27 10:42:01] VERBOSE[421247][C-000000dc] res_rtp_asterisk.c: 0x7f387801c870 -- Strict RTP learning after ICE completion
[Mar 27 10:42:01] VERBOSE[130825] res_rtp_asterisk.c: 0x7f387801c870 -- Strict RTP learning after remote address set to: 10.99.1.199:64174
[Mar 27 10:42:01] VERBOSE[421247][C-000000dc] app_bridgewait.c: PJSIP/2041-000000df is entering waiting bridge 2041:d29e7bf8-e21b-4eb5-9265-08bcd9f7eeff
[Mar 27 10:42:01] VERBOSE[421247][C-000000dc] bridge_channel.c: Channel PJSIP/2041-000000df joined 'holding_bridge' base-bridge <d29e7bf8-e21b-4eb5-9265-08bcd9f7eeff>
[Mar 27 10:42:01] VERBOSE[421247][C-000000dc] res_rtp_asterisk.c: 0x7f387801c870 -- Strict RTP switching to RTP target address 10.99.1.199:64174 as source
[Mar 27 10:42:22] Asterisk 20.5.0 built by root @ polaris on a x86_64 running Linux on 2023-12-11 17:04:11 UTC

Asterisk Issue Guidelines

joshelson commented 1 month ago

Realizing now this is probably related to #487 but after upgrading, it's nearly constant for us, even under very light load.

Does the additional BT info here help pinpoint the issue?

jcolp commented 1 month ago

Not overly, but it looks to be the exact same thing as #487 at its core. I'd suggest putting the information on there. That issue has already been accepted, even without the additional information previously required.

jcolp commented 1 month ago

I'm marking as a duplicate accordingly per my previous comment. Throw additional info over on that issue.