BelledonneCommunications / bcg729

Linphone.org mirror for bcg729 (git://git.linphone.org/bcg729.git)
http://linphone.org
GNU General Public License v3.0
116 stars 78 forks source link

I can only run 400 concurrent requests on my machine(4c 8g). Is this normal? #15

Open lixiangzzz2017 opened 2 months ago

lixiangzzz2017 commented 2 months ago

as title. the cpu utilization will raised up to 95% the code are as below. sleep 20ms to simulate real phone

package main

import (
    "fmt"
    "io"
    "net/http"
    _ "net/http/pprof"
    "os"
    "postsuperman/codec/g729"
    "strconv"
    "sync"
    "time"
)

/*
#include <stdio.h>
int sum(int a, int b) {
    return a + b;
}
*/
import "C"

func main() {
    go func() {
        err := http.ListenAndServe("0.0.0.0:6065", nil)
        if err != nil {
            fmt.Println(err)
        }
    }()

    dirName := os.Args[1]
    list, err := os.ReadDir(dirName)
    if err != nil {
        panic(err)
    }

    concurrency, err := strconv.Atoi(os.Args[2])
    fmt.Println(concurrency)
    if err != nil {
        panic(err)
    }

    m := map[string][][]byte{}
    for _, entry := range list {
        fullName := dirName + "/" + entry.Name()
        inputWAV, err := os.Open(fullName)
        if err != nil {
            panic(err)
        }

        wavHeader := make([]byte, 44)
        if _, err = inputWAV.Read(wavHeader); err != nil {
            panic(err)
        }

        arr := [][]byte{}
        for {
            buf := make([]byte, 160)
            if n, err := inputWAV.Read(buf); err == io.EOF {
                break
            } else if err != nil {
                break
            } else if n != 160 {
                // ignore last frame if frame size is invalid
                break
            }
            arr = append(arr, append([]byte{}, buf...))
        }
        m[entry.Name()] = arr
    }
    concurrentCh := make(chan struct{}, concurrency)
    for {
        wg := &sync.WaitGroup{}
        for i := 0; i < 500; i++ {
            for _, v := range m {
                wg.Add(1)
                go func(wg *sync.WaitGroup, v [][]byte) {
                    concurrentCh <- struct{}{}
                    defer func() {
                        <-concurrentCh
                    }()
                    defer wg.Done()

                    enc := g729.NewEncoder(false)
                    defer enc.Close()
                    dec := g729.NewDecoder()
                    defer dec.Close()
                    for _, value := range v {
                        // for range v {
                        time.Sleep(20 * time.Millisecond)
                        // C.sum(C.int(2), C.int(3))
                        if encodedByte, err := enc.Encode(value); err != nil {
                            return
                        } else {
                            if _, err := dec.Decode(encodedByte); err != nil {
                                return
                            }
                        }
                    }
                }(wg, v)
            }
        }
        wg.Wait()
    }
}

[Image]

jeannotlapin commented 2 months ago

Hi, on a relatively old CPU (intel core i5-6600T), a single core can encrypt/decrypt around 250 streams (doing really only that). 400 streams/core seems to be realistic for some more modern CPU core.

A common performance issue with this lib is to build it without the optimization flag (-O2),. When not present it will increase the request on CPU by a factor 3. If your test runs only on the CPU and does not use the GPU (which seems likely) it leaves you with 4 cores running it. 100 streams/core without the -O2 on the build line is more or less expected.

lixiangzzz2017 commented 2 months ago

Thank you. At which step should I add the flag (O2) during the build? I built it according to the instructions on the homepage.

jeannotlapin commented 1 month ago

To enable -O2 option on the compiler command line you must add

-DCMAKE_BUILD_TYPE=RelWithDebInfo

to the cmake configuration command

lixiangzzz2017 commented 1 month ago

I've already tried to build with -O2 option, the machine can run 1200 streams now, nearly about 3 times before.Really appreciate. Why dont you guys add it into the instructions on homepage