OpenSIPS / opensips

OpenSIPS is a GPL implementation of a multi-functionality SIP Server that targets to deliver a high-level technical solution (performance, security and quality) to be used in professional SIP server platforms.
https://opensips.org
Other
1.28k stars 581 forks source link

load balancer is not probing #1315

Closed yteltom closed 5 years ago

yteltom commented 6 years ago

version: opensips 2.3.2 (x86_64/linux) flags: STATS: On, DISABLE_NAGLE, USE_MCAST, SHM_MMAP, PKG_MALLOC, F_MALLOC, FAST_LOCK-ADAPTIVE_WAIT ADAPTIVE_WAIT_LOOPS=1024, MAX_RECV_BUFFER_SIZE 262144, MAX_LISTEN 16, MAX_URI_SIZE 1024, BUF_SIZE 65535 poll method support: poll, epoll_lt, epoll_et, sigio_rt, select. git revision: f74855934 main.c compiled on 21:51:12 Sep 28 2017 with gcc 5.4.0

Was told to open a ticket from the irc channel.

Using opensips as a frontend for multiple freeswitch servers. Opensips is receiving the freeswitch polling info. I have options set to be able to remove freeswitches from routing when they are down. At this time see no options being sent to freeswitch servers. Even when a freeswitch server stops responding, opensips sends calls..

relevant parts of opensips.cfg loadmodule "load_balancer.so" modparam("load_balancer", "db_url", "mysql://opensips:password@localhost/opensips")

modparam("load_balancer", "db_table", "load_balancer")

modparam("load_balancer", "initial_freeswitch_load", 15) modparam("load_balancer", "fetch_freeswitch_stats", 1) modparam("load_balancer", "probing_method", "OPTIONS") modparam("load_balancer", "probing_interval", 5)

from database

id | group_id | dst_uri | resources | probe_mode | description | +-----+----------+-------------------------------+---------------------------------------------------------+------------+----------------------+ | 1 | 2 | sip:172.31.214.5:5060 | ch=10000 | 1 | LA Sansay | | 2 | 2 | sip:172.31.32.5:5060 | ch=10000 | 1 | AB Sansay | | 13 | 1 | sip:2043i.message360.com:5080 | ch=fs://:div0k9Ni6CcTuqOCYTgQQMqwU@2043i.message360.com | 1 | 2043i.message360.com | | 17 | 1 | sip:2063i.message360.com:5080 | ch=fs://:div0k9Ni6CcTuqOCYTgQQMqwU@2063i.message360.com | 1 | 2063i.message360.com | | 23 | 1 | sip:2069i.message360.com:5080 | ch=fs://:div0k9Ni6CcTuqOCYTgQQMqwU@2069i.message360.com | 1 | 2069i.message360.com | | 27 | 1 | sip:2047i.message360.com:5080 | ch=fs://:div0k9Ni6CcTuqOCYTgQQMqwU@2047i.message360.com | 1 | 2047i.message360.com | | 45 | 1 | sip:2048i.message360.com:5080 | ch=fs://:div0k9Ni6CcTuqOCYTgQQMqwU@2048i.message360.com | 1 | 2048i.message360.com | | 51 | 1 | sip:2062i.message360.com:5080 | ch=fs://:div0k9Ni6CcTuqOCYTgQQMqwU@2062i.message360.com | 1 | 2062i.message360.com | | 59 | 1 | sip:2045i.message360.com:5080 | ch=fs://:div0k9Ni6CcTuqOCYTgQQMqwU@2045i.message360.com | 1 | 2045i.message360.com |

Have repliocated issue accross 5 different platforms, other users in irc claim same issue.

volga629 commented 6 years ago

Hello Everyone, Same think with dispatcher if freeswitch node marked as Inactive opensips never change status back. Tcpdump confirms that freeswitch send back HEARBEAT packet through even socket request and status Active. Meaning that column state in dispatcher table never updated with 1 or 0.

Even Socket packet

14:37:31.169703 IP (tos 0x0, ttl 64, id 2377, offset 0, flags [DF], proto TCP (6), length 993)
    10.18.130.24.8021 > 10.18.130.27.5378: Flags [P.], cksum 0x1c2b (incorrect -> 0xf869), seq 741802:742755, ack 1, win 229, length 953
E...    I@.@..w
...
....U...g.8L..fP....+..{"Event-Name":"HEARTBEAT","Core-UUID":"73ebc223-3bc0-4cb0-bdea-4530e56be2db","FreeSWITCH-Hostname":"casip00.networklab.prod","FreeSWITCH-Switchname":"casip00.networklab.prod","FreeSWITCH-IPv4":"10.18.130.24","FreeSWITCH-IPv6":"","Event-Date-Local":"2018-03-20 14:37:31","Event-Date-GMT":"Tue, 20 Mar 2018 18:37:31 GMT","Event-Date-Timestamp":"1521571051157325","Event-Calling-File":"switch_core.c","Event-Calling-Function":"send_heartbeat","Event-Calling-Line-Number":"74","Event-Sequence":"149865","Event-Info":"System Ready","Up-Time":"0 years, 1 day, 3 hours, 21 minutes, 59 seconds, 419 milliseconds, 557 microseconds","FreeSWITCH-Version":"1.6.12~64bit","Uptime-msec":"98519419","Session-Count":"0","Max-Sessions":"1000","Session-Per-Sec":"30","Session-Per-Sec-Last":"0","Session-Per-Sec-Max":"2","Session-Per-Sec-FiveMin":"0","Session-Since-Startup":"13","Session-Peak-Max":"2","Session-Peak-FiveMin":"0","Idle-CPU":"98.766667"}
14:37:31.170554 IP (tos 0x0, ttl 64, id 32539, offset 0, flags [DF], proto TCP (6), length 40)
bogdan-iancu commented 6 years ago

@yteltom , have you tried to check the in-memory status of the dispatcher destinations by doing opensipsctl fifo ds_list ?

bogdan-iancu commented 6 years ago

@volga629 , could you assist with more information here ? what is the probing mode you set for your FS boxes ?

yteltom commented 6 years ago

I am traveling todat. I will be available tomorrow. I can get what ever you need at that time.

Tom Wish Voice Engineer

Office (800) 382-4913 <8003824913> Cell (702) 444-1443 <7024441443> tom@ytel.com richard@ytel.com | ytel.com http://www.ytel.com http://www.ytel.com [image: facebook] https://www.facebook.com/askytel [image: Twitter] https://twitter.com/AskYtel [image: Twitter] http://www.linkedin.com/company/ytel


CONFIDENTIALITY NOTICE: UNAUTHORIZED INTERCEPTION IS PROHIBITED. It is intended exclusively for the individuals or entity to which it is addressed. This communication may contain information that is proprietary, privileged, confidential or otherwise legally exempt or prohibited from disclosure. If you are not the named addressee, you are not authorized to read, print, retain, copy or disseminate this message or any part of it. If you have received this message in error, please notify the sender immediately by e-mail and delete all copies of the message.

On Mon, Apr 16, 2018 at 10:14 AM, Bogdan Andrei IANCU < notifications@github.com> wrote:

@volga629 https://github.com/volga629 , could you assist with more information here ? what is the probing mode you set for your FS boxes ?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/OpenSIPS/opensips/issues/1315#issuecomment-381640437, or mute the thread https://github.com/notifications/unsubscribe-auth/Aj2wpk5XFPSMNEkS4BLcAneeA8V8pZKOks5tpLU5gaJpZM4SymV- .

volga629 commented 6 years ago

In my case is dispatcher and we use function to fetch fs stats and in dispatcher table weight column is set to fs even socket URL. If fs box goes down state column get set inactive and even fs box come back online and even socket send good heartbeat dispatcher table state column never updated to active state.

loadmodule "dispatcher.so"
modparam("dispatcher", "db_url", "DB_DRIVER://DB_URL")
modparam("dispatcher", "table_name", "dispatcher")
modparam("dispatcher", "setid_col", "setid")
modparam("dispatcher", "priority_col", "priority")
modparam("dispatcher", "destination_col", "destination")
modparam("dispatcher", "cnt_avp", "$avp(274)")
modparam("dispatcher", "grp_avp", "$avp(275)")
modparam("dispatcher", "hash_pvar", "$avp(273)")
modparam("dispatcher", "dst_avp", "$avp(271)")
modparam("dispatcher", "sock_avp", "$avp(276)")
modparam("dispatcher", "ds_ping_from", "sip:proxy@lan ip")
modparam("dispatcher", "ds_ping_method", "OPTIONS")
modparam("dispatcher", "ds_ping_interval", 10)
modparam("dispatcher", "ds_probing_mode", 1)
modparam("dispatcher", "fetch_freeswitch_stats", 1)
modparam("dispatcher", "options_reply_codes", "501,403,404,400,200")

root@casbc00 ~> [~]# opensipsctl dispatcher show
dispatcher gateways
1   1   sip:lan_ip:lan_port     0   fs://:password@ip:port  0       FS  2a16ae97-d9fd-4cca-8121-65ee0e9889cb
bogdan-iancu commented 6 years ago

@yteltom , I see you use probe_mode 1 in db, which means probing only when the destinations are disabled. So, please provide the output of opensipsctl fifo ds_list in order to check the status of the destinations, according to OpenSIPS internals.

bogdan-iancu commented 6 years ago

@volga629 , we have here a bit of a cross topics...you talk about dispatcher while @yteltom about load-balancer. So, you say that the disabling part it ok ? if you doi opensipsctl fifo ds_list, do you see the destination in probing mode ? or what is the status of the destination after it went down ? Now, I see you have ds_probing_mode set to 1 (probing all the time) - after the box comes back up, do you see the probing OPTIONS being replied with 200 OK by the box ?

volga629 commented 6 years ago

Yes I see probing.

Destination Active

Last login: Mon Apr 16 15:36:53 2018 from 10.18.130.29
root@casbc00 ~> [~]# opensipsctl fifo ds_list
PARTITION:: default
    SET:: 1
        URI:: sip:ip:5160 state=Active first_hit_counter=4853

Stop freeswitch corresponding profile

freeswitch@> sofia profile internal-proxy stop
Reload XML [Success]
stopping: internal-proxy
2018-04-17 15:27:39.096629 [INFO] switch_xml.c:1313 No files to include at /etc/freeswitch/lang/de/phrases/*.xml
2018-04-17 15:27:39.096629 [INFO] switch_xml.c:1313 No files to include at /etc/freeswitch/lang/en/phrases/*.xml
2018-04-17 15:27:39.096629 [INFO] switch_xml.c:1313 No files to include at /etc/freeswitch/lang/fr/phrases/*.xml
2018-04-17 15:27:39.096629 [INFO] switch_xml.c:1313 No files to include at /etc/freeswitch/lang/ru/phrases/*.xml
2018-04-17 15:27:39.096629 [INFO] switch_xml.c:1313 No files to include at /etc/freeswitch/lang/he/phrases/*.xml
2018-04-17 15:27:39.096629 [INFO] mod_enum.c:879 ENUM Reloaded
2018-04-17 15:27:39.096629 [INFO] switch_time.c:1423 Timezone reloaded 530 definitions
2018-04-17 15:27:40.236509 [NOTICE] sofia.c:3416 Waiting for worker thread
2018-04-17 15:27:40.236509 [INFO] switch_core_sqldb.c:1720 sofia:internal-proxy Destroying SQL queue.
2018-04-17 15:27:40.436501 [INFO] switch_core_sqldb.c:1678 sofia:internal-proxy Stopping SQL thread.
2018-04-17 15:27:40.436501 [DEBUG] sofia.c:3471 Write lock internal-proxy
freeswitch@> 

Opensips starts probing

root@casbc00 ~> [~]# opensipsctl fifo ds_list
PARTITION:: default
    SET:: 1
        URI:: sip:ip:5160 state=Probing first_hit_counter=4868
root@casbc00 ~> [~]# 

Start profile back

freeswitch@> sofia profile internal-proxy start
Reload XML [Success]
internal-proxy started successfully

2018-04-17 15:32:30.256511 [INFO] switch_xml.c:1313 No files to include at /etc/freeswitch/lang/de/phrases/*.xml
2018-04-17 15:32:30.256511 [INFO] switch_xml.c:1313 No files to include at /etc/freeswitch/lang/en/phrases/*.xml
2018-04-17 15:32:30.256511 [INFO] switch_xml.c:1313 No files to include at /etc/freeswitch/lang/fr/phrases/*.xml
2018-04-17 15:32:30.256511 [INFO] switch_xml.c:1313 No files to include at /etc/freeswitch/lang/ru/phrases/*.xml
2018-04-17 15:32:30.256511 [INFO] switch_xml.c:1313 No files to include at /etc/freeswitch/lang/he/phrases/*.xml
2018-04-17 15:32:30.277714 [INFO] mod_enum.c:879 ENUM Reloaded
volga629 commented 6 years ago

Yes OPTIONS send and 200 OK reply is received

bogdan-iancu commented 6 years ago

And after starting back, the ds_list still reports the destination as Probing ? even if you see the 200 OK replies to the pings ?

volga629 commented 6 years ago

In my case it Probing

bogdan-iancu commented 6 years ago

@volga629 , so FS is back up, it replies with 200 OK to the probes, but according to opensipsctl fifo ds_list, it is still in Probe mode ?

volga629 commented 6 years ago

yes

bogdan-iancu commented 6 years ago

@volga629, if I provide you with a patch for extra logging, will you be able to apply and recompile ?

volga629 commented 6 years ago

Yes provide me patch I will generate new rpm for testing.

bogdan-iancu commented 6 years ago

OK, apply this one dispatcher_probing_patch.txt and see the CRITICAL logs about the probing.

volga629 commented 6 years ago

Sorry for delay, I got new rpm with patch built. Will have maintenance window today evening to apply it. I was wonder if it related to this commit a2934c7

bogdan-iancu commented 6 years ago

No, it is not related, the issue is still open from our perspective.

stale[bot] commented 5 years ago

Any updates here? No progress has been made in the last 15 days, marking as stale. Will close this issue if no further updates are made in the next 30 days.

stale[bot] commented 5 years ago

Marking as closed due to lack of progress for more than 30 days. If this issue is still relevant, please re-open it with additional details.