Open tsawyer opened 1 year ago
Typically this indicates the channels container is locked somewhere in some kind of deadlock. Manager requests aren't processed during shutdown so that bit is to be expected.
A full backtrace is definitely needed.
I believe this is the full backtrace from that time.
This might be a clew why core stop gracefully
hangs. Notice the channels "Refusing due to active shutdown" messages. Also note that the channels attempt to restart when shutting down.
FYI... I created this console dump by starting with asterisk -c
as I am concerned that the ast-asterisk.service might impact debugging the issue. Then doing core stop now
.
core stop now
[2023-06-08 12:01:24.877] Beginning asterisk shutdown....
[2023-06-08 12:01:24.978] -- Hungup 'DAHDI/pseudo-161833083'
[2023-06-08 12:01:24.982] -- Hungup 'DAHDI/pseudo-979897630'
[2023-06-08 12:01:24.984] -- Hungup 'DAHDI/pseudo-11751437'
[2023-06-08 12:01:24.993] -- Hungup 'DAHDI/pseudo-635202703'
[2023-06-08 12:01:24.994] -- Hungup 'DAHDI/pseudo-1007224127'
[2023-06-08 12:01:24.999] -- Hungup 'DAHDI/pseudo-1903152809'
[2023-06-08 12:01:25.008] -- Hungup 'DAHDI/pseudo-1968121433'
[2023-06-08 12:01:25.009] -- Hungup 'DAHDI/pseudo-410232704'
[2023-06-08 12:01:25.015] -- Hungup 'DAHDI/pseudo-1555853723'
[2023-06-08 12:01:25.022] -- Hungup 'DAHDI/pseudo-1706859608'
[2023-06-08 12:01:25.023] -- Hungup 'DAHDI/pseudo-1106395156'
[2023-06-08 12:01:25.030] -- Hungup 'DAHDI/pseudo-552964170'
[2023-06-08 12:01:25.038] -- Hungup 'DAHDI/pseudo-1952052834'
[2023-06-08 12:01:25.051] -- Hungup 'DAHDI/pseudo-414503075'
[2023-06-08 12:01:25.056] -- Hungup 'DAHDI/pseudo-305369340'
[2023-06-08 12:01:25.061] -- Hungup 'DAHDI/pseudo-488172553'
[2023-06-08 12:01:25.268] WARNING[1217062]: app_rpt.c:6043 rpt_master: rpt_thread restarted on node 2509
[2023-06-08 12:01:25.268] WARNING[1217062]: app_rpt.c:6043 rpt_master: rpt_thread restarted on node 25334
[2023-06-08 12:01:25.269] WARNING[1217089]: channel.c:751 __ast_channel_alloc_ap: Channel allocation failed: Refusing due to active shutdown
[2023-06-08 12:01:25.270] WARNING[1217089]: chan_simpleusb.c:2356 simpleusb_request: Unable to create new usb channel
[2023-06-08 12:01:25.270] WARNING[1217089]: app_rpt.c:3032 rpt: Sorry unable to obtain Rx channel
[2023-06-08 12:01:25.273] WARNING[1217088]: channel.c:751 __ast_channel_alloc_ap: Channel allocation failed: Refusing due to active shutdown
[2023-06-08 12:01:25.273] ERROR[1217088]: chan_voter.c:2226 voter_request: Cant alloc new asterisk channel
[2023-06-08 12:01:25.273] WARNING[1217088]: app_rpt.c:3032 rpt: Sorry unable to obtain Rx channel
[2023-06-08 12:01:25.276] Asterisk cleanly ending (0).
[2023-06-08 12:01:25.277] Executing last minute cleanups
[2023-06-08 12:01:25.279] == Manager unregistered action DBGet
[2023-06-08 12:01:25.280] == Manager unregistered action DBGetTree
[2023-06-08 12:01:25.280] == Manager unregistered action DBPut
[2023-06-08 12:01:25.280] == Manager unregistered action DBDel
[2023-06-08 12:01:25.281] == Manager unregistered action DBDelTree
Looks like a NULL lock that libcurl
tries to grab:
#1 __GI___pthread_rwlock_rdlock (rwlock=0x0) at pthread_rwlock_rdlock.c:27
result = <optimized out>
Probably an issue with res_rpt_http_registrations
though; have you seen this again since?
Revisiting this issue, hasn't reproduced the libcurl problem. It is locking up the Asterisk console with various process still running...
ps aux | grep asterisk
root 158819 18.4 0.4 57396 9980 pts/0 R+ 09:12 0:24 rasterisk r
ps aux | grep asterisk
asterisk 158997 11.6 2.9 2203928 59552 ? Ssl 09:17 0:17 /usr/sbin/asterisk -g -f -p -U asterisk
asterisk 158998 0.0 0.0 2328 892 ? S 09:17 0:00 astcanary /var/run/asterisk/alt.asterisk.canary.tweet.tweet.tweet 158997
root 159103 0.0 0.4 57392 9888 pts/1 S+ 09:18 0:00 rasterisk r
Revisiting this issue, hasn't reproduced the libcurl problem. It is locking up the Asterisk console with various process still running...
If it's happening again, a fresh backtrace is needed from each recurrence to compare. I actually don't think it's libcurl either, probably something else.
How do we get a backtrace if there's not a coredump?
How do we get a backtrace if there's not a coredump?
/var/lib/asterisk/scripts/ast_coredumper --RUNNING
The system I was testing (@jxmx debs) does not have gdb. Moved to another server to get this coredump.
Looks like chan_voter did not reload?
The system I was testing (@jxmx debs) does not have gdb. Moved to another server to get this coredump.
Looks like chan_voter did not reload?
What's the CPU usage look like? Seems like it could be an infinite loop.
Looks like chan_voter did not reload?
My mistake. Actually that's not from core stop gracefully
. I did a module reload chan_voter
when moving the RTCM to this server.
Here's a dump from core stop gracefully
with --RUNNING
This one mentions asterisk_daemon
Looks like chan_voter did not reload?
My mistake. Actually that's not from
core stop gracefully
. I did amodule reload chan_voter
when moving the RTCM to this server.
Still, it might be a separate bug. What was the CPU usage?
I uploaded the debug symbol packages to https://github.com/AllStarLink/asl3-asterisk/releases/tag/v20.5.0%2Basl3-0.0.10.fadead4.deb12-1
Just install them and gdb will work fine.
Here's a dump from
core stop gracefully
with --RUNNINGThis one mentions asterisk_daemon
Looking at thread 1 isn't appropriate here since this isn't from a crash fault.
But looking at the CLI thread, it's waiting for all channels to exit, which makes sense. Some of the RPT modules weren't designed to exit gracefully so there are still some things to clean up here.
*CLI>
core stop gracefully
hangs the cli and continuously prints: