AllStarLink / app_rpt

Refactoring and upgrade of AllStarLink's app_rpt, etc.
5 stars 4 forks source link

chan_echolink: Possible database corruption #213

Open tsawyer opened 12 months ago

tsawyer commented 12 months ago

N7LF ASL3 install has been up for months but not used until a week or so ago. This supports the theory that app_rpt only crashes if it's actively being used. Echolink was added a couple of days ago.

Only logged indication of activity near the crash times are status reports and dns updates.

iTerm2 Session Aug 24, 2023 at 7:33:29 AM.txt

core-asterisk-2023-08-24T08-48-05Z-full.txt

KB4MDD commented 12 months ago

Were they running the latest software release? (With the latest chan_echolink changes?)

This error is caused by chan_echolink in el_db_delete. #6 0x00007fdbd966e0b5 in el_db_delete (node=0x7fdbd00b0f40) at chan_echolink.c:815

I don't think it was the result of using app_rpt, but rather a problem with directory maintenance in chan_echolink.

tsawyer commented 11 months ago

I confirmed that N7LF is running the latest code.

Another core dump this morning. core-asterisk-2023-08-28T06-30-17Z-full.txt

KB4MDD commented 11 months ago

Can you search /var/log/asterisk/messages.log for the text "Echolink internal database corruption"

I am also interested in anything that might have been logged right before the crash. I am thinking you may find "Error in directory download"

This is different from the first error reported. This error is related to inflateEnd(&z).

tsawyer commented 11 months ago

I don't see anything anything of interest. But here's the log for your perusal. messages.log

KB4MDD commented 11 months ago

After further review, it does not appear that they are using the latest version of the code. The dump shows:

#4 0x00007f537f5ab96b in el_directory (data=0x0) at chan_echolink.c:3114

Which is not the correct line number for 'do_el_directory' in the latest code. They are running one commit behind the current code base.

Looking at commit https://github.com/InterLinked1/app_rpt/blob/c5dbde754024e504c550396902a672865f21febe/channels/chan_echolink.c : the problem is still associated with inflateEnd(&z); Just at a different place in the program.

I will research further.

tsawyer commented 11 months ago

I was sure I pulled when you asked and there was nothing new. But today I see new code but nothing about echolink.

git pull
remote: Enumerating objects: 26, done.
remote: Counting objects: 100% (26/26), done.
remote: Compressing objects: 100% (13/13), done.
remote: Total 26 (delta 15), reused 22 (delta 13), pack-reused 0
Unpacking objects: 100% (26/26), 72.71 KiB | 1.91 MiB/s, done.
From https://github.com/InterLinked1/app_rpt
   c16d2cf..6bf2e4d  master          -> origin/master
 * [new branch]      File_Descriptor -> origin/File_Descriptor
 * [new branch]      issue217        -> origin/issue217
Updating c16d2cf..6bf2e4d
Fast-forward
 apps/app_rpt.c               | 10 +++++++---
 apps/app_rpt/rpt_functions.c | 17 ++++++++++-------
 apps/app_rpt/rpt_telemetry.c |  6 +++---
 channels/chan_simpleusb.c    |  9 ++++++---
 channels/chan_usbradio.c     |  8 +++++---
 channels/xpmr/xpmr.c         |  8 +++-----
 6 files changed, 34 insertions(+), 24 deletions(-)

I will update again.

tsawyer commented 11 months ago

Looks like we agree the source is current based on this line and one other. 3103 rc = do_el_directory(instances[0]->elservers[curdir]); I feel confident current source is running. We'll see how it goes.