SIPp / sipp

The SIPp testing tool
https://sipp.readthedocs.io
Other
918 stars 380 forks source link

segmentation fault on playing pcap file ( sipp 3.7.2 ) #673

Closed gitkoyot closed 6 months ago

gitkoyot commented 9 months ago

Hello, I have a scenario where both UAC and UAS are playing 10 sec audio. I noticed that sometimes UAS part crashes. When I executed it with valgrind this is what I found:

==6613== Thread 2:
==6613== Invalid read of size 2
==6613==    at 0x185541: send_packets (send_packets.c:241)
==6613==    by 0x13936F: send_wrapper(void*) (call.cpp:6824)
==6613==    by 0x53183EB: start_thread (pthread_create.c:444)
==6613==    by 0x539896F: clone (clone.S:100)
==6613==  Address 0x647d988 is 8 bytes inside a block of size 40 free'd
==6613==    at 0x48431EF: free (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==6613==    by 0x1170B6: CAction::setPcapArgs(char const*) (actions.cpp:536)
==6613==    by 0x136152: call::executeAction(char const*, message*) (call.cpp:6083)
==6613==    by 0x12456E: call::executeMessage(message*) (call.cpp:1900)
==6613==    by 0x1257E4: call::run() (call.cpp:2214)
==6613==    by 0x1A69F1: traffic_thread(int&, int&) (sipp.cpp:589)
==6613==    by 0x1AB53C: main (sipp.cpp:2147)
==6613==  Block was alloc'd at
==6613==    at 0x4840808: malloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==6613==    by 0x1170DA: CAction::setPcapArgs(char const*) (actions.cpp:541)
==6613==    by 0x136152: call::executeAction(char const*, message*) (call.cpp:6083)
==6613==    by 0x12456E: call::executeMessage(message*) (call.cpp:1900)
==6613==    by 0x1257E4: call::run() (call.cpp:2214)
==6613==    by 0x1A69F1: traffic_thread(int&, int&) (sipp.cpp:589)
==6613==    by 0x1AB53C: main (sipp.cpp:2147)

also stack trace points to similar

                                                                                                                                        Thread 11 "sipp" received signal SIGSEGV, Segmentation fault.
                                                        [Switching to Thread 0x7ffff24bd6c0 (LWP 7449)]
 ...

(gdb) bt full
#0  0x00005555555d1502 in send_packets (play_args=0x555555a94a90) at ./src/send_packets.c:240
        __cancel_buf = {__cancel_jmp_buf = {{__cancel_jmp_buf = {140737258446044, 6969434371858591346, -50016, 0, 140737488343392, 
                140737250054144, 6969434371602738802, 3885807745694920306}, __mask_was_saved = 0}}, __pad = {0x7ffff24b0400, 0x0, 0x0, 0x0}}
        __cancel_routine = 0x5555555d1072 <send_packets_cleanup>
        __cancel_arg = 0x7ffff24b0314
        __not_first_call = 0
        pkt_max = 0x555555acc118
        from_port = 0x555555a94b1a
        didsleep = {tv_sec = 0, tv_usec = 0}
        pkt_index = 0x0
        temp_sum = 0
        port_diff = 0
        to_port = 0x555555a94a9a
        start = {tv_sec = 0, tv_usec = 0}
        pkts = 0x555555a00d20
        bind_addr = {ss_family = 2, __ss_padding = "\000\000\177\000\001\001", '\000' <repeats 111 times>, __ss_align = 0}
        ret = 0
        from6 = {sin6_family = 0, sin6_port = 0, sin6_flowinfo = 0, sin6_addr = {__in6_u = {__u6_addr8 = '\000' <repeats 15 times>, 
              __u6_addr16 = {0, 0, 0, 0, 0, 0, 0, 0}, __u6_addr32 = {0, 0, 0, 0}}}, sin6_scope_id = 0}
        sock = 27
        last = {tv_sec = 0, tv_usec = 0}
        to = 0x555555a94a98
        from = 0x555555a94b18
        udp = 0x7ffff24b0560
        to6 = {sin6_family = 0, sin6_port = 0, sin6_flowinfo = 0, sin6_addr = {__in6_u = {__u6_addr8 = '\000' <repeats 15 times>, 
              __u6_addr16 = {0, 0, 0, 0, 0, 0, 0, 0}, __u6_addr32 = {0, 0, 0, 0}}}, sin6_scope_id = 0}
        buffer = '\000' <repeats 1499 times>
        len = 16
        __cancel_buf = {__cancel_jmp_buf = {{__cancel_jmp_buf = {140737258446044, 6969434371858591346, -50016, 0, 140737488343392, 
                140737250054144, 6969434371602738802, 3885807745510764146}, __mask_was_saved = 0}}, __pad = {0x7ffff24b0bb0, 0x0, 0x0, 0x0}}
        __cancel_routine = 0x5555555d109a <send_packets_pcap_cleanup>
        __cancel_arg = 0x555555a94a90
        __not_first_call = 0
#1  0x0000555555585370 in send_wrapper (arg=0x555555a94a90) at ./src/call.cpp:6824
        s = 0x555555a94a90
#2  0x00007ffff742b3ec in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:444
        ret = <optimized out>
        pd = <optimized out>
        out = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140737341731120, -6969457636240036238, -50016, 0, 140737488343392, 140737250054144, 
                6969434371883757170, 6969441033572544114}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, 
              cleanup = 0x0, canceltype = 0}}}
        not_first_call = <optimized out>
#3  0x00007ffff74aba4c in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

i suspect (and I might be wrong) that the issue is that the call is shorter than the audio file. Or there is another race condition or my scripts (attached in separate message) are wrong.

gitkoyot commented 9 months ago

I tried to extract issue to following scenarios. When call rate is set to 10CPS - everything works on my machine. When I switch to 100CPS it crashes

g711PCMUstream.zip uac.sh.txt uac.xml.txt uas.sh.txt uas.xml.txt valgrind-uas.txt.txt

please dont mind uac crashing - it crashes because connection is reset by peer

man1207 commented 7 months ago

Hi! Observe the same behaviour

`

`

This crushes sipp with segmentation fault when increase CPS over 35-40, concurrent calls ~ 1500.

peter-oneill commented 6 months ago

Caused by use-after-free, introduced by d8899fca5beeb00dba0d7e40a622ddce0fb88f64 under https://github.com/SIPp/sipp/pull/521

I'll look into a fix

peter-oneill commented 6 months ago

Suspect a duplicate of #610, #576

peter-oneill commented 6 months ago

@orgads I'm considering whether the best solution here might be to revert #521

The high-level problem with the PR is that it changes the pcap play and RTP stream actions to hold per-call variables, but without changing the underlying control blocks to represent that. Instead, the action that is still in use for previous calls are partially freed before being altered and reused for a new call. The issues I've identified coming from this are:

  1. segfaults - see this and other linked issues
  2. other use-after-free (the issue I've hit) - where the source and destination port for an existing stream can be altered when a new stream is started
  3. scalability - previously all calls with PCAP play would just iterate through the same set of packet buffers, now even when using the same PCAP file, every call will create its own set of packet buffers.
  4. memory leak - the buffers allocated in the item above are never freed.

How crucial is #521 to you? I count at least 6 users from the issues (me included) who have hit this, and it seems to me that the best way to resolve this for now would be to remove the PR, either pending a suitable reworking to resolve the issues above, or indefinitely if the feature isn't important enough.

orgads commented 6 months ago

I need it. We use it all the time.

Is it too hard to fix?

peter-oneill commented 6 months ago

Point 3 above isn't fixable - it turns the PCAP handling from O(1) to O(N), where N=concurrent calls. It intrinsically changes the scalability of running SIPp with media.

The other three points (1,2,4) would be resolvable, but it's at the scale of the fix being bigger than the initial code changes.

I'm surprised you don't have issues; what sort of scenarios are you running? Do you run concurrent media streams using PCAP play?

orgads commented 6 months ago

Is it only for pcap? we use wav streaming.

orgads commented 6 months ago

You may revert the change. I'll try to look into it if I find the time for it.

peter-oneill commented 6 months ago

Looks like WAV files use RTPStream rather than PCAP play? When I looked at the effect of the code changes, it looks like RTPStream doesn't suffer the same issues as the PCAP play: Some logic in rtpstream.cpp suggests each individual file is cached (rather than one per call even if they're the same), that the same use-after-free issues are avoided.

I could remove the feature for PCAP play without removing for RTPStream?

orgads commented 6 months ago

Sure, thank you.