Closed KB4MDD closed 2 months ago
A full backtrace is needed.
I believe this may relate to my issue with autopatch in ASL 3.0. Here are some observations:
Issue 1 – Calls dialed out over IAX to my pbx were failing because bridge technology is not available. Solution was to load simple_bridge.so.
Issue 2 – Calls answers on the pbx extension pass no audio.
Issue 3 – Once answered, my simplex node remains in transmit state. No audio is passed from the radio to the pbx extension. Hanging up the pbx extension does not take the radio out of transmit mode. Asterisk must be stopped and then the radio goes back to receive mode.
Shouldn't we unlock the mutex before this call? Not sure this is the deadlock or not, but inside the function call we are again trying to lock the already locked mutex. https://github.com/AllStarLink/app_rpt/blob/67c6a63081dd06a9be2a061e50f947ce9e1b1bb7/apps/app_rpt.c#L1381
@InterLinked1 thought this was a deadlock issue. I have a sip provider setup on my system, I however have not had a chance to get a dump to check on the deadlock.
I will try to do that tomorrow and see if I can identify the deadlock.
I think you can force the deadlock simply by bringing up the autopatch to a local extension as described in #358 .
Even if the extension only has `n, Hangup()
the app goes out to lunch.
I believe they are related.
You did find a problem, but there are others. I am recompiling changes now.
More debugging information:
=== Thread ID: 0x7fbaae0cf6c0 LWP:173361 (rpt started at [ 5692] app_rpt.c rpt_master())
=== ---> Waiting for Lock #0 (app_rpt.c): MUTEX 5357 rpt &myrpt->lock 0x7fbaf087a260
=== 478098:15:16.229956, 00:00:00.000000, 00:00:17.398070 (1, 1)
[0x557d251fd7e0] asterisk lock.c:273 __ast_pthread_mutex_lock()
[0x7fbaf085e6c6] app_rpt.so rpt_telemetry.c:2550 rpt_tele_thread()
[0x557d252d4bfc] asterisk utils.c:1607 dummy_start()
[0x7fbaf441d044] libc.so.6 pthread_create.c:442 start_thread()
[0x7fbaf449d61c] libc.so.6 clone3.S:83 clone3()
=== --- ---> Locked Here: app_rpt.c line 1341 (rpt_call)
=== -------------------------------------------------------------------
===
=== Thread ID: 0x7fbaf1c0b6c0 LWP:173373 (rpt_call started at [ 981] app_rpt/rpt_functions.c function_autopatchup())
=== ---> Lock #0 (app_rpt.c): MUTEX 1341 rpt_call &myrpt->lock 0x7fbaf087a260
=== 478098:15:16.228639, 00:00:00.000000, 00:00:17.868313 (1, 1)
[0x557d251fd7e0] asterisk lock.c:273 __ast_pthread_mutex_lock()
[0x7fbaf0808fa7] app_rpt.so app_rpt.c:1342 rpt_call()
[0x557d252d4bfc] asterisk utils.c:1607 dummy_start()
[0x7fbaf441d044] libc.so.6 pthread_create.c:442 start_thread()
[0x7fbaf449d61c] libc.so.6 clone3.S:83 clone3()
=== -------------------------------------------------------------------
===
=== Thread ID: 0x7fbaf1d056c0 LWP:173374 (rpt_tele_thread started at [ 2923] app_rpt/rpt_telemetry.c rpt_telemetry())
=== ---> Waiting for Lock #0 (app_rpt/rpt_telemetry.c): MUTEX 2549 rpt_tele_thread &myrpt->lock 0x7fbaf087a260
=== 478098:15:18.309725, 478098:15:18.309535, 00:00:16.083308 (1, 1)
[0x557d251fd7e0] asterisk lock.c:273 __ast_pthread_mutex_lock()
[0x7fbaf085e6c6] app_rpt.so rpt_telemetry.c:2550 rpt_tele_thread()
[0x557d252d4bfc] asterisk utils.c:1607 dummy_start()
[0x7fbaf441d044] libc.so.6 pthread_create.c:442 start_thread()
[0x7fbaf449d61c] libc.so.6 clone3.S:83 clone3()
=== --- ---> Locked Here: app_rpt.c line 1341 (rpt_call)
[2024-07-16 13:15:14.771] DEBUG[173361]: app_rpt/rpt_telemetry.c:381 cancel_pfxtone: cancel_pfxfone!!
[2024-07-16 13:15:14.772] DEBUG[173361]: app_rpt.c:1432 collect_function_digits: digits=6 source=0
[2024-07-16 13:15:14.903] DEBUG[173361]: app_rpt/rpt_telemetry.c:381 cancel_pfxtone: cancel_pfxfone!!
[2024-07-16 13:15:14.903] DEBUG[173361]: app_rpt.c:1432 collect_function_digits: digits=61 source=0
[2024-07-16 13:15:14.903] DEBUG[173361]: app_rpt.c:1484 collect_function_digits: @@@@ action: autopatchup, param = noct = 1,farenddisconnect = 1,dialtime = 20000,context = autopatch
[2024-07-16 13:15:14.904] DEBUG[173361]: app_rpt.c:1490 collect_function_digits: @@@@ table index i = 1
[2024-07-16 13:15:14.904] DEBUG[173361]: app_rpt/rpt_functions.c:905 function_autopatchup: @@@@ Autopatch up
[2024-07-16 13:15:14.904] DEBUG[173361]: app_rpt.c:1502 collect_function_digits: rv=3
[2024-07-16 13:15:14.907] DEBUG[173373]: app_rpt.c:1181 rpt_call: Requested channel DAHDI/pseudo-326500165
[2024-07-16 13:15:14.907] DEBUG[173373]: app_rpt/rpt_call.c:32 rpt_disable_cdr: No CDR present on DAHDI/pseudo-326500165
[2024-07-16 13:15:14.910] DEBUG[173373]: app_rpt.c:1198 rpt_call: Requested channel DAHDI/pseudo-520732042
[2024-07-16 13:15:14.910] DEBUG[173373]: app_rpt/rpt_call.c:32 rpt_disable_cdr: No CDR present on DAHDI/pseudo-520732042
[2024-07-16 13:15:16.195] DEBUG[173361]: app_rpt/rpt_telemetry.c:2613 rpt_telemetry: Tracepoint rpt_telemetry() entered mode=1
[2024-07-16 13:15:16.195] DEBUG[173361]: app_rpt/rpt_telemetry.c:2613 rpt_telemetry: Tracepoint rpt_telemetry() entered mode=46
[2024-07-16 13:15:16.196] DEBUG[173361]: app_rpt/rpt_telemetry.c:2930 rpt_telemetry: Tracepoint rpt_telemetry() exit
[2024-07-16 13:15:16.199] DEBUG[173374]: app_rpt/rpt_telemetry.c:1028 rpt_tele_thread: Requested channel DAHDI/pseudo-1428510647
[2024-07-16 13:15:16.200] DEBUG[173374]: app_rpt/rpt_call.c:32 rpt_disable_cdr: No CDR present on DAHDI/pseudo-1428510647
[2024-07-16 13:15:16.200] DEBUG[173374]: app_rpt/rpt_telemetry.c:1036 rpt_tele_thread: Queued telemetry, active_telem = (nil), mytele = 0x7fbab4082db0
[2024-07-16 13:15:16.200] DEBUG[173374]: app_rpt/rpt_telemetry.c:1056 rpt_tele_thread: Beginning telemetry, active_telem = 0x7fbab4082db0, mytele = 0x7fbab4082db0
[2024-07-16 13:15:16.200] DEBUG[173374]: app_rpt/rpt_channel.c:56 wait_interval: Delay interval = 1000
[2024-07-16 13:15:17.251] DEBUG[173374]: app_rpt/rpt_channel.c:60 wait_interval: Delay complete
Discussing the following code with @InterLinked1 👍
/* put vox channel monitoring on the channel */
if (dahdi_conf_add(myrpt->voxchannel, res, DAHDI_CONF_MONITOR)) {
ast_hangup(mychannel);
return -1;
}
This goes to
static int __join_dahdiconf(struct ast_channel *chan, struct dahdi_confinfo *ci, const char *file, int line, const char *function)
{
ci->chan = 0;
/* First put the channel on the conference in proper mode */
if (ioctl(ast_channel_fd(chan, 0), DAHDI_SETCONF, ci) == -1) {
ast_log(LOG_WARNING, "%s:%d (%s) Unable to set conference mode on %s\n", file, line, function, ast_channel_name(chan));
return -1;
}
return 0;
The ioctl call is hanging.
I have a fresh locks and backtrace. The lock is on thread id 0x7fcaef1ff6c0 - the backtrace does not show that thread.
This is probably my lack of understanding showing here, but why are we mutex locking for so long (amount of code) in this area? https://github.com/AllStarLink/app_rpt/blob/67c6a63081dd06a9be2a061e50f947ce9e1b1bb7/apps/app_rpt.c#L1341-L1409
Usually one only locks things that 2 threads can change while changing "stuff", but I don't see a lot changing in this chunk of code. Do we need to freeze all of the vars for the tests in here? Things like https://github.com/AllStarLink/app_rpt/blob/67c6a63081dd06a9be2a061e50f947ce9e1b1bb7/apps/app_rpt.c#L1344-L1345 are called while the lock is in place.
Looks like we are getting an invalid conference number when joining to the vox channel.
DAHDI/pseudo-2088963636 conference: 32779
Looking inside DAHDI, the conference number maximum is 1024.
The problem is associated with getting the vox channel to join the conference. When I remove the code for the vox channel, the patch comes up and everything works properly. Disconnect worked also.
I now have this working. I am consulting on the changes.
The fix for this issue has been incorporated into ASL3-Asterisk 20.9.1+asl3-3.0.4-1
I am trying to setup the autopatch function. The call is placed; however, the repeater does not repeat any audio. The repeater stops working. You cannot bring it up after the call starts. A core restart now is required to gain access to the repeater.
Debugging information. (I masked the telephone number and user name) (rpt debug set to 7, verbose set to 7, iax debug on)