OpenSIPS / opensips

OpenSIPS is a GPL implementation of a multi-functionality SIP Server that targets to deliver a high-level technical solution (performance, security and quality) to be used in professional SIP server platforms.
https://opensips.org
Other
1.26k stars 576 forks source link

1.7: crash because `rpl` (sip_msg) in build_local is cleared #41

Closed wdoekes closed 9 years ago

wdoekes commented 11 years ago

Hi. I'm running opensips 1.7 r9016. Sorry for not upgrading.

But, as far as I can tell, no updates have been made to modules/tm since then that have fixed this issue I experienced (once).

The following happens (sipcaparseye output):

09:56:03.665599 CUST:55271 > OSIPS:6060 160 INVITE
09:56:03.665757 OSIPS:6060 > CUST:55271 160 INVITE(100)
09:56:03.761652 OSIPS:6060 > CUST:55271 160 INVITE(407)
09:56:03.804410 CUST:55271 > OSIPS:6060 160 ACK

(customer attempts to call, gets auth request)

09:56:03.878668 CUST:55271 > OSIPS:6060 161 INVITE
09:56:03.878851 OSIPS:6060 > CUST:55271 161 INVITE(100)

(auth is correct, call is forwarded to pbx)

09:56:03.880829 OSIPS:5060 > PBX__:5060 161 INVITE
09:56:03.904852 PBX__:5060 > OSIPS:5060 161 INVITE(100)
09:56:03.943376 PBX__:5060 > OSIPS:5060 161 INVITE(183)
09:56:03.943623 OSIPS:6060 > CUST:55271 161 INVITE(183)

(number-does-not-exist tones are played)

09:56:08.345867 CUST:55271 > OSIPS:6060 161 CANCEL

(customer cancels)

09:56:08.345976 PBX__:5060 > OSIPS:5060 161 INVITE(404)

(simultaneously pbx sends 404)

09:56:08.346056 OSIPS:6060 > CUST:55271 161 CANCEL(200)
09:56:08.346056 OSIPS:5060 > PBX__:5060 161 ACK
09:56:08.348717 OSIPS:6060 > CUST:55271 161 INVITE(404)
09:56:08.393844 CUST:55271 > OSIPS:6060 161 ACK 

(and here we crash)

Backtrace says:

Program terminated with signal 11, Segmentation fault.
#0  0x00007f49faaa1dab in build_local (Trans=0x7f49f389fd88, branch=0, method=0x7fff64ba8340, extra=0x7f49facc0da0, rpl=0x7efec8, len=0x7fff64ba83ac) at t_msgbuilder.c:123
123         to.s = rpl->to->name.s;
(gdb) print rpl
$1 = (struct sip_msg *) 0x7efec8
(gdb) print *rpl
$2 = {id = 0, first_line = {type = 0, len = 0, u = {request = {method = {s = 0x0, len = 0}, uri = {s = 0x0, len = 0}, version = {s = 0x0, len = 0}, method_value = 0}, reply = {version = {s = 0x0, 
      len = 0}, status = {s = 0x0, len = 0}...
(gdb) print rpl->to
$3 = (struct hdr_field *) 0x0

(gdb) up
#1  0x00007f49faa8acd5 in build_cancel (Trans=0x7f49f389fd88, branch=0, len=0x7fff64ba83ac) at t_cancel.c:137
137     return build_local( Trans, branch, &method, extra,
138         Trans->uac[branch].reply , len );
(gdb) print branch
$4 = 0
(gdb) print Trans->uac[branch].reply
$5 = (struct sip_msg *) 0x0

So.. how come rpl is non-zero in build_local, while it is zero in the caller struct. Has it been cleared/freed by another process in the mean time?

Is there anything else I can get you?

Regards, Walter Doekes OSSO B.V.

bogdan-iancu commented 11 years ago

Hi Walter,

Is there a way to reproduce the scenario ? can you reproduce it somehow ? If you still have the core file, is there any chance to get access to it ?

Thanks and regards, Bogdan

bogdan-iancu commented 11 years ago

Walter, thanks for the access to the code - I found the reason for the crash (related to concurrent access to a part of the transaction structure - the "reply" pointer of UAC) - I will dig in for a fix, but it does not seem to be an easy one :)

Regards, Bogdan

cpugeniusmv commented 10 years ago

Hello,

We had a very similar crash earlier this week also on a 1.7 release. A CANCEL and a 500 response arrived at opensips at almost exactly the same time.

I assume that based on what you have discovered that this issue could still present itself in newer versions. Is that correct?

Thanks, Mike

MayamaTakeshi commented 10 years ago

Bogdan, since this bug was also confirmed in 1.11, we can assume it is present in 1.8, 1.9 and 1.10. Do you think this also exists in 1.6? I am asking because I am trying to upgrade from 1.6 to 1.11. But with the crashes, I am considering in delaying this upgrade till we get a fix. But if the bug is likely to exist in 1.6 then I would not delay the upgrade. Regards, Takeshi

bogdan-iancu commented 10 years ago

@MayamaTakeshi , this bug is present even before 1.4 release. So, the upgrade will not change anything from this perspective. Regards, Bogdan

bogdan-iancu commented 9 years ago

This was fixed today via 271c7b on master, 75106d on 2.1, 156fba on 1.11, 6457b8 on 1.10 and e5ab62 on 1.8.