OpenSIPS / opensips

OpenSIPS is a GPL implementation of a multi-functionality SIP Server that targets to deliver a high-level technical solution (performance, security and quality) to be used in professional SIP server platforms.
https://opensips.org
Other
1.28k stars 581 forks source link

[CRASH] in acc module after adding acc_extra or acc_leg fields #2691

Open ar45 opened 2 years ago

ar45 commented 2 years ago

OpenSIPS version you are running

version: opensips 3.1.2 (x86_64/linux)
flags: STATS: On, DISABLE_NAGLE, USE_MCAST, SHM_MMAP, PKG_MALLOC, Q_MALLOC, F_MALLOC, HP_MALLOC, DBG_MALLOC, FAST_LOCK-ADAPTIVE_WAIT
ADAPTIVE_WAIT_LOOPS=1024, MAX_RECV_BUFFER_SIZE 262144, MAX_LISTEN 16, MAX_URI_SIZE 1024, BUF_SIZE 65535
poll method support: poll, epoll, sigio_rt, select.
git revision: 5cdfedebb
main.c compiled on  with gcc 8

Crash Core Dump

Bellow is the relevant traceback

(gdb) #0  0x00007f5530150a53 in acc_log_cdrs (dlg=dlg@entry=0x7f5531f6c9f0, msg=<optimized out>, ctx=ctx@entry=0x7f553222e400) at acc.c:243
        log_msg = <optimized out>
        log_msg_end = 0x7f553019215e <log_msg+65534> ""
        p = <optimized out>
        i = 7
        j = <optimized out>
        ret = 7
        res = -1
        n = <optimized out>
        start_time = {tv_sec = 1637154496, tv_usec = 569536}
        core_s = {s = 0x7f5571c5bd58 "\006", len = 93}
        leg_s = {s = 0x0, len = <optimized out>}
        extra_s = {s = 0x0, len = <optimized out>}
        extra = 0x7f5571a3a460
        __FUNCTION__ = "acc_log_cdrs"
#1  0x00007f553015ab70 in acc_cdr_cb (t=<optimized out>, type=<optimized out>, ps=0x7ffd6f852630) at acc_logic.c:972
        ctx = 0x7f553222e400
        dlg = 0x7f5531f6c9f0
        __FUNCTION__ = "acc_cdr_cb"

Describe the traffic that generated the bug

Crashes here https://github.com/OpenSIPS/opensips/blob/master/modules/acc/acc.c#L245

A call mid dialog, after adding a acc_extra field and restarting, then the dialog acc ctx does not have sufficient values matching the extra->tag_idx count and segfaults

To Reproduce

  1. Make a call with cdr accounting.
  2. While the call is in progress,
  3. add an acc_extra field.
  4. Restart opensips.
  5. End the call,
  6. Opensips crashes.

Relevant System Logs

nikbyte commented 2 years ago

How many acc_extra do you have? Issue like you have can be if there are more than 64 such vars.

nikbyte commented 2 years ago

@bogdan-iancu I expected such behaviour when I had more than #define MAX_ACC_EXTRA 64 variables. It would be good to add check and log message.

ar45 commented 2 years ago

Another issue is that currently if you change the order of acc_extra fields, all mid dialog sessions will mix up the values.

On Wed, Nov 17, 2021, 2:34 PM Nick Altmann @.***> wrote:

@bogdan-iancu https://github.com/bogdan-iancu I expected such behaviour when I had more than #define MAX_ACC_EXTRA 64 variables. It would be good to add check and log message.

β€” You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/OpenSIPS/opensips/issues/2691#issuecomment-971974737, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABFR6GBUIJCXZIBNWWBOWBLUMQGW5ANCNFSM5IHZFENQ .

nikbyte commented 2 years ago

Please use one issue per ticket.

Another issue is that currently if you change the order of acc_extra This is not issue, but design. Dialogs keep accounting variables ordered. If you change order of them, old dialogs will mix up the values.

Just don't do this.

ar45 commented 2 years ago

Nick, I hear what you are saying, but this means you can never make changes to production live machine unless there are no active calls.

On Wed, Nov 17, 2021, 4:27 PM Nick Altmann @.***> wrote:

Please use one issue per ticket.

Another issue is that currently if you change the order of acc_extra This is not issue, but design. Dialogs keep accounting variables ordered. If you change order of them, old dialogs will mix up the values.

Just don't do this.

β€” You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/OpenSIPS/opensips/issues/2691#issuecomment-972158585, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABFR6GAXA7IDBYK4LR3GMS3UMQT6TANCNFSM5IHZFENQ .

ar45 commented 2 years ago

I consider this an issue. The design is an implementation technicality.

Also, I'm not discussing this on the GitHub ticket.

It's the mailing list and pointing out a broader issue within the same context :)

On Wed, Nov 17, 2021, 4:31 PM Podrigal, Aron @.***> wrote:

Nick, I hear what you are saying, but this means you can never make changes to production live machine unless there are no active calls.

On Wed, Nov 17, 2021, 4:27 PM Nick Altmann @.***> wrote:

Please use one issue per ticket.

Another issue is that currently if you change the order of acc_extra This is not issue, but design. Dialogs keep accounting variables ordered. If you change order of them, old dialogs will mix up the values.

Just don't do this.

β€” You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/OpenSIPS/opensips/issues/2691#issuecomment-972158585, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABFR6GAXA7IDBYK4LR3GMS3UMQT6TANCNFSM5IHZFENQ .

bogdan-iancu commented 2 years ago

@nikbyte , @ar45 , as I see it, the issue is not the 64 limit, but the fact that during the call, OpenSIPS is restarted with a different set of extra / leg acc vars. This will conflict (and crash) as the acc context that was created with the original set and restored after restart will have a different number of keys when being used after restart. In this particular case, it will look for one more extra val which actually does not exists in the acc context.

ar45 commented 2 years ago

@bogdan-iancu can we store in the acc context the keys and order of how it was when initialized, and then check it to match when restarted?

This will help both issues, A) not go over more than the amount of keys which exists in the acc context B) if a key was removed, or keys were reordered, we can reorder them when restoring / reloading the acc context to match after restart.

nikbyte commented 2 years ago

@bogdan-iancu then we have two issues to fix and good opportunity to fix them together. πŸ™‚

bogdan-iancu commented 2 years ago

@ar45 , storing also the keys maybe escalate the memory usage (to store the same data for each call), considering the fact there is no way to change the set of keys during runtime. Maybe adding some extra info on the number of keys stored in the context, to avoid overflowing when adding more keys. Of course, this will not solve the problem when you change the order of the keys.

ar45 commented 2 years ago

How about computer a hash of all the keys on start and store that on the dialogue so we can compare that when restoring?

Also, we only need this info when saving dialog to database upon shutting down opensips so we have it when restarting..

On Fri, Nov 19, 2021, 2:24 AM Bogdan Andrei IANCU @.***> wrote:

@ar45 https://github.com/ar45 , storing also the keys maybe escalate the memory usage (to store the same data for each call), considering the fact there is no way to change the set of keys during runtime. Maybe adding some extra info on the number of keys stored in the context, to avoid overflowing when adding more keys. Of course, this will not solve the problem when you change the order of the keys.

β€” You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/OpenSIPS/opensips/issues/2691#issuecomment-973858764, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABFR6GGQAUVB6UAFN27MFLDUMYCTFANCNFSM5IHZFENQ .

bogdan-iancu commented 2 years ago

@razvancrainea is looking a bit deeper into this, to see what our options are.

ar45 commented 2 years ago

Thanks @bogdan-iancu @razvancrainea

There are 2 issues

  1. The problem that opensips should not crash. Considering we fix this by checking the length of the tags using a tag_count variable.
  2. If I reorder the tags they get fixed up after restart and all CDR records get mixed up.

For issue 2, I thought of a workaround,

the first acc_extra tag name will always be extra_tags_version I will always initialize that at the beginning of the dialog.

I will add a script route acc_check_dlg_version_match, that will run for every in-dialog request. within this route, I will check the version number of the $acc(extra_tags_version) with the STATIC value (I will be incrementing each time I make a change). If the change requires moving values from the old acc_extra tagenames to the new ones, I will do that here. In the rare case a DB accounting record will be created without this script route running. I can later check the db record insertion time, correlating that with the acc_extra_version number the dialog has been created with, and fixup the values in the db by manually moving the field values accordingly.

This is not a simple process, and is very error prone, I wish we can come up with a better solution.

Maybe, opensips can do that automatically. Maybe adding
mod_param("acc", "acc_extra_version", 3); mod_param("acc", "acc_extra_version_update_route", "ROUTE_NAME_TO_RUN_WHEN_VERSION_MISMATCH");

and populate $dlg_acc_extra(old-tag-name) with the values and tags as was with previous version.


# this route will run every time a dialog acc context is loaded and the acc context version does not match the existing `acc_extra_version`

route[ROUTE_NAME_TO_RUN_WHEN_VERSION_MISMATCH] {

  # User can check and use there own logic for how to update the acc_extra variables from previous version to make it compatible with new version

  if ($dlg_acc_extra_version == 1) {
    $acc_extra("var1") = $dlg_acc_extra("var2");

  } else if ($dlg_acc_extra_version == 2) {
    $acc_extra("var1") = $dlg_acc_extra("something-else");

  }

}

Once there are no dialogs with a certain tag version in use, that record can be removed from db.

Just brainstorming.

github-actions[bot] commented 2 years ago

Any updates here? No progress has been made in the last 15 days, marking as stale. Will close this issue if no further updates are made in the next 30 days.

ar45 commented 2 years ago

Still waiting

On Wed, Dec 8, 2021, 12:41 AM github-actions[bot] @.***> wrote:

Any updates here? No progress has been made in the last 15 days, marking as stale. Will close this issue if no further updates are made in the next 30 days.

β€” You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/OpenSIPS/opensips/issues/2691#issuecomment-988542450, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABFR6GCVYAPS2XONSLMT23LUP342RANCNFSM5IHZFENQ .

razvancrainea commented 2 years ago

Hi, @ar45 !

Apologies for getting back after such a long period - I was caught with other tasks and I honestly didn't get to write my thoughts down. The main idea is indeed to avoid any crashes. In order to address this issue, there are two components that we need to consider: A. master branch, and B. stable releases. For the master branch, there are several things we can do, and my suggestion is to have a configurable way of storing accounting in dialog:

  1. add the number of extra values/legs before iterating - this is the simplest way to fix the problem, but is prone to different errors (reordering extras, renaming, etc).
  2. store the name of the extras as well - this is more complex and requires more space, but would allow a better recovery after a restart (support reordering, renaming, etc).

The problem however is that changing the layout of how values are stored will break compatibility with previous versions - and this is quite a sensitive matter when it comes to accounting. That's why for stable releases, we need to have a different approach - our proposal is to write some sort of cookie in the layout of the value that can help us to understand if there is an old format, or a new one, which will consist of the number of values (solution 1 in master).

Any other ideas, considering the back-porting limitations, are welcome.

Best regards, Răzvan

ar45 commented 2 years ago

@razvancrainea this sounds good.

In regards to keeping track of the config keys and order. Can opensips store this in the db on startup and then re-parse that.

The idea is that acc_extra will have multiple definitions of the tag names and each one associated with a version corresponding in the dialog.

Another method would be to have a config table where we would specify the acc_extra instead and opensips would set a marker on startup if there are no dialogs using a specific definition indicating it is safe to delete.

github-actions[bot] commented 2 years ago

Any updates here? No progress has been made in the last 15 days, marking as stale. Will close this issue if no further updates are made in the next 30 days.

ar45 commented 2 years ago

yes

On Sun, Dec 26, 2021 at 12:41 AM github-actions[bot] < @.***> wrote:

Any updates here? No progress has been made in the last 15 days, marking as stale. Will close this issue if no further updates are made in the next 30 days.

β€” Reply to this email directly, view it on GitHub https://github.com/OpenSIPS/opensips/issues/2691#issuecomment-1001113771, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABFR6GCIWRBT7PHTBRAV2LLUS22HZANCNFSM5IHZFENQ . You are receiving this because you were mentioned.Message ID: @.***>

--

- Aron Podrigal

github-actions[bot] commented 2 years ago

Any updates here? No progress has been made in the last 15 days, marking as stale. Will close this issue if no further updates are made in the next 30 days.

razvancrainea commented 2 years ago

Yeah, still waiting to get some time to work on this.

github-actions[bot] commented 2 years ago

Any updates here? No progress has been made in the last 15 days, marking as stale. Will close this issue if no further updates are made in the next 30 days.

ar45 commented 2 years ago

Still waiting

github-actions[bot] commented 2 years ago

Any updates here? No progress has been made in the last 15 days, marking as stale. Will close this issue if no further updates are made in the next 30 days.

ar45 commented 2 years ago

yes

ar45 commented 2 years ago

Still waiting

github-actions[bot] commented 2 years ago

Any updates here? No progress has been made in the last 15 days, marking as stale. Will close this issue if no further updates are made in the next 30 days.