AllStarLink / app_rpt

Refactoring and upgrade of AllStarLink's app_rpt, etc.
9 stars 8 forks source link

asl-3.0.4 app_rpt crash on connection restarting Asterisk #385

Open mkmer opened 3 months ago

mkmer commented 3 months ago

Asterisk 20.9.1+asl3-3.0.4-1.deb12

[2024-08-19 07:26:23.577] WARNING[1659030][C-00000009]: app_rpt/rpt_channel.c:469 send_newkey: Failed to send text IAX2/68.37.156.214:4569-8462 on !NEWKEY1!
    -- Hungup 'IAX2/68.37.156.214:4569-8462'
[2024-08-19 07:26:23.578] WARNING[1658972][C-00000009]: channel.c:2272 ast_channel_destructor: PBX may not have been terminated properly on 'IAX2/68.37.156.214:4569-8462'
[2024-08-19 07:26:23.584] ERROR[1659030][C-00000009]: app_rpt/rpt_channel.c:467 send_newkey: FRACK!, Failed assertion bad magic number 0x0 for object 0x7fbedc001ab0 (0)
[2024-08-19 07:26:23.585] ERROR[1659030][C-00000009]:   Got 11 backtrace records
# 0: /usr/sbin/asterisk(+0x6dc4d) [0x55a5e2ea7c4d]
# 1: /usr/sbin/asterisk(__ao2_lock+0xac) [0x55a5e2ea7d5c]
# 2: /usr/lib/x86_64-linux-gnu/asterisk/modules/app_rpt.so(+0x2356c) [0x7fbed322356c]
# 3: /usr/lib/x86_64-linux-gnu/asterisk/modules/app_rpt.so(+0x137e2) [0x7fbed32137e2]
# 4: /usr/sbin/asterisk(pbx_exec+0xce) [0x55a5e2f6a73e]
# 5: /usr/sbin/asterisk(+0x124f75) [0x55a5e2f5ef75]
# 6: /usr/sbin/asterisk(+0x126d4c) [0x55a5e2f60d4c]
# 7: /usr/sbin/asterisk(+0x1281eb) [0x55a5e2f621eb]
# 8: /usr/sbin/asterisk(+0x1afd4d) [0x55a5e2fe9d4d]
# 9: /lib/x86_64-linux-gnu/libc.so.6(+0x89134) [0x7fbf132f9134]
#10: /lib/x86_64-linux-gnu/libc.so.6(+0x1097dc) [0x7fbf133797dc]

[2024-08-19 07:26:23.585] ERROR[1659030][C-00000009]: channel.c:4799 ast_sendtext_data: FRACK!, Failed assertion bad magic number 0x0 for object 0x7fbedc001ab0 (0)
[2024-08-19 07:26:23.586] ERROR[1659030][C-00000009]:   Got 13 backtrace records
# 0: /usr/sbin/asterisk(+0x6dc4d) [0x55a5e2ea7c4d]
# 1: /usr/sbin/asterisk(__ao2_lock+0xac) [0x55a5e2ea7d5c]
# 2: /usr/sbin/asterisk(ast_sendtext_data+0x75) [0x55a5e2ee0385]
# 3: /usr/sbin/asterisk(ast_sendtext+0x75) [0x55a5e2ee0965]
# 4: /usr/lib/x86_64-linux-gnu/asterisk/modules/app_rpt.so(+0x23577) [0x7fbed3223577]
# 5: /usr/lib/x86_64-linux-gnu/asterisk/modules/app_rpt.so(+0x137e2) [0x7fbed32137e2]
# 6: /usr/sbin/asterisk(pbx_exec+0xce) [0x55a5e2f6a73e]
# 7: /usr/sbin/asterisk(+0x124f75) [0x55a5e2f5ef75]
# 8: /usr/sbin/asterisk(+0x126d4c) [0x55a5e2f60d4c]
# 9: /usr/sbin/asterisk(+0x1281eb) [0x55a5e2f621eb]
#10: /usr/sbin/asterisk(+0x1afd4d) [0x55a5e2fe9d4d]
#11: /lib/x86_64-linux-gnu/libc.so.6(+0x89134) [0x7fbf132f9134]
#12: /lib/x86_64-linux-gnu/libc.so.6(+0x1097dc) [0x7fbf133797dc]

[2024-08-19 07:26:23.586] ERROR[1659030][C-00000009]: channel.c:4802 ast_sendtext_data: FRACK!, Failed assertion bad magic number 0x0 for object 0x7fbedc001ab0 (0)
[2024-08-19 07:26:23.586] ERROR[1659030][C-00000009]:   Got 13 backtrace records
# 0: /usr/sbin/asterisk(+0x6dc4d) [0x55a5e2ea7c4d]
# 1: /usr/sbin/asterisk(__ao2_unlock+0x8a) [0x55a5e2ea7e1a]
# 2: /usr/sbin/asterisk(ast_sendtext_data+0x5bf) [0x55a5e2ee08cf]
# 3: /usr/sbin/asterisk(ast_sendtext+0x75) [0x55a5e2ee0965]
# 4: /usr/lib/x86_64-linux-gnu/asterisk/modules/app_rpt.so(+0x23577) [0x7fbed3223577]
# 5: /usr/lib/x86_64-linux-gnu/asterisk/modules/app_rpt.so(+0x137e2) [0x7fbed32137e2]
# 6: /usr/sbin/asterisk(pbx_exec+0xce) [0x55a5e2f6a73e]
# 7: /usr/sbin/asterisk(+0x124f75) [0x55a5e2f5ef75]
# 8: /usr/sbin/asterisk(+0x126d4c) [0x55a5e2f60d4c]
# 9: /usr/sbin/asterisk(+0x1281eb) [0x55a5e2f621eb]
#10: /usr/sbin/asterisk(+0x1afd4d) [0x55a5e2fe9d4d]
#11: /lib/x86_64-linux-gnu/libc.so.6(+0x89134) [0x7fbf132f9134]
#12: /lib/x86_64-linux-gnu/libc.so.6(+0x1097dc) [0x7fbf133797dc]

[2024-08-19 07:26:23.586] WARNING[1659030][C-00000009]: app_rpt/rpt_channel.c:469 send_newkey: Failed to send text  on !NEWKEY1!
[2024-08-19 07:26:23.586] ERROR[1659030][C-00000009]: app_rpt/rpt_channel.c:471 send_newkey: FRACK!, Failed assertion bad magic number 0x0 for object 0x7fbedc001ab0 (0)
KB4MDD commented 3 months ago

Can you run the asterisk core dumper to get us more information?

sudo /var/lib/asterisk/scripts/ast_coredumper /core

I am assuming that the core file is in the root directory.

It will place several files in the /tmp directory.

mkmer commented 3 months ago

I'm getting "no valid core dump" - I do have a core file in /, but it has a Jun 19 date. There is also a core file in /var/lib/asterisk/ which has an Aug 19 date... same error when I run the command pointed there.

Examining /core
    Does appear to be an asterisk coredump
    Coredump indicates executable '/usr/sbin/asterisk'
    Searching for asterisk module directory
    Can't extract asterisk module directory.
    Either the executable '/usr/sbin/asterisk' has no symbols
    or it's changed since the coredump was generated.
    Either way we can't use it.  If you still have the
    binary that created this coredump, or can recreate
    the binary from the exact same code base and exact same
    options that were used to to create the binary that generated
    this coredump, specify its location with the
    --asterisk-bin option.
No valid coredumps found
KB4MDD commented 3 months ago

After some internal discussions, I need for you to install the following packages.

sudo apt install asl3-asterisk-dbgsym asl3-asterisk-modules-dbgsym

Once these are installed, please run the ast_coredumper again.

mkmer commented 3 months ago

core-asterisk-2024-08-19T11-26-23Z-thread1.txt core-asterisk-2024-08-19T11-26-23Z-locks.txt core-asterisk-2024-08-19T11-26-23Z-info.txt core-asterisk-2024-08-19T11-26-23Z-full.txt core-asterisk-2024-08-19T11-26-23Z-brief.txt

Worked like a charm :) Hopefully it's useful.

Additional info: After I rebooted the machine, it worked fine. We were up for ~30 days when things went funky.

KB4MDD commented 3 months ago

Can you tell me what was happening at the time of the crash?

mkmer commented 3 months ago

A remote node was attempting to "dial" in to 29972. When the connection was "established" the errors posted above were dumped. It appeared that asterisk restarted, and repeater started working again until the node attempted to dial in (again). It was "stuck" in this mode until I rebooted the machine. I did not try restarting the service (although I thought it was restarting automagically). After the reboot, it returned to "normal" operation and the remote node could dial in and link up fine.

KB4MDD commented 3 months ago

Thank you. Can anyone dial into your node? Or does it crash each time.

mkmer commented 3 months ago

After the reboot, dial in is back to "normal". Before rebooting, it crashed every time. We were at ~30 days of run time on the server. It was also working "fine" on day 29.

mpannen1979 commented 2 months ago

I've been gettting these frequently on an up to date pi3.

I am running two nodes on one pi: A hub with duplex=1, 'radio' (usb) in duplex=0 mode. Radio node connected to hub node locally.

Uptime doesn't seem to matter. I'll walk away for 5 minutes and stuff is no longer 'repeating'. Hmm

mpannen1979 commented 2 months ago

core-asterisk-2024-09-09T01-12-28Z-brief.txt core-asterisk-2024-09-09T01-12-28Z-full.txt core-asterisk-2024-09-09T01-12-28Z-info.txt core-asterisk-2024-09-09T01-12-28Z-locks.txt core-asterisk-2024-09-09T01-12-28Z-thread1.txt

mpannen1979 commented 2 months ago

astres.sh allows things to work again

KB4MDD commented 2 months ago

Can you describe "no longer repeating"?
Can you attach your /var/log/asterisk/messages.log file.

mkmer commented 2 months ago

Stop repeating: asterisk restarted during which the repeater stopped repeating (people were talking on it at the time). As my node was set to permanent connection, it kept trying to connect and the repeater side asterisk kept restarting. If took a full machine reboot to "fix" it. My guess would be a memory leak, but it didn't think to check memory before rebooting.

I'm on a 2 week trip, so it will be a bit before I can search for the log. If we still have it I will attach it when I return home.

I suspect it will happen again after 30 days or so, so if we have a list of things you need, I'll try to grab them before rebooting :)

mpannen1979 commented 2 months ago

For me, the connected nodes would no longer key up anything or pass audio.

I have attached a snippet of my messages.log

It looks similar to mkmer's messages.log

KB4MDD commented 2 months ago

I want to be careful here - we have two different people reporting what may be two different problems. I will look and see what I can discover.

mkmer commented 2 months ago

I think my (reported) problem is different. When a connection was made to the repeater, asterisk restarted, any connection caused it to restart. The resolution was to reboot and everything went back to normal. We have never noticed problems with established connections.