alphacep / vosk-server

WebSocket, gRPC and WebRTC speech recognition server based on Vosk and Kaldi libraries
Apache License 2.0
896 stars 243 forks source link

The speed of Parallel request #120

Closed ben-8878 closed 3 years ago

ben-8878 commented 3 years ago

I really don't know what the problem is,the parallel request is slow than serial request. Who know it? just see than only a asr_server process is running. lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 16 On-line CPU(s) list: 0-15 Thread(s) per core: 1 Core(s) per socket: 16 parallel requesttime find testset/ -name "*.wav" | parallel --j 10 python asr_client3.py real 1m5.154s user 0m1.372s sys 0m0.507s serial requesttime find testset/ -name "*.wav" | parallel --j 1 python asr_client3.py real 0m55.854s user 0m1.417s sys 0m0.493s

svenha commented 3 years ago

Bold text and commands don't fit. Please check carefully.

nshmyrev commented 3 years ago

Maybe your filesystem is very slow (network drive?). We saw that with the model loading. Try to run the commands several times so that files are loaded into memory.

I just tried on my side - its 16 seconds for parallel and 42 seconds for sequential.

ben-8878 commented 3 years ago

Maybe your filesystem is very slow (network drive?). We saw that with the model loading. Try to run the commands several times so that files are loaded into memory.

I just tried on my side - its 16 seconds for parallel and 42 seconds for sequential. all cpu are free but the unix mount two ossfs disk.

when start parallel task,it was same to sequnential task. only one cpu is running.

 PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM  

31138 ybZhang   20   0   13.3g   2.2g  32336 S  1593  1.2. 

32225 ybZhang   20   0  227536  16396   6056 S   3.0  0.0. 

32213 ybZhang   20   0  227536  16480   6056 S   0.7  0.0. 

 1370 root      20   0 1522348  35088   2208 S   0.3  0.0. 

 6231 wbZhuang  20   0  499376  14688   4796 S   0.3  0.0. 

27633 root      20   0       0      0      0 S   0.3  0.0. 

30414 ybZhang   20   0  169308   3440   2536 R   0.3  0.0. 

30430 ybZhang   20   0  166396   3016   1368 S   0.3  0.0. 

But there are many child processes in the background

  840 pts/10   Sl+    0:00 python asr_test.py wavs/BAC009S07. 

  857 pts/10   Sl+    0:00 python asr_test.py wavs/BAC009S07. 

  861 pts/10   Sl+    0:00 python asr_test.py wavs/BAC009S07. 

  866 pts/10   Sl+    0:00 python asr_test.py wavs/BAC009S07.  
nshmyrev commented 3 years ago

only one cpu is running. all cpu are free but the unix mount two ossfs disk.

I'm sorry, I don't get you. On your top I see 1593% CPU which means all 16 cores are busy.

nshmyrev commented 3 years ago

ossfs

That thing definitely going to be slow. You'd better have some local tests first.

ben-8878 commented 3 years ago

ossfs

That thing definitely going to be slow. You'd better have some local tests first.

ok, I try to amount it and test

ben-8878 commented 3 years ago

only one cpu is running. all cpu are free but the unix mount two ossfs disk.

I'm sorry, I don't get you. On your top I see 1593% CPU which means all 16 cores are busy.

I just send one request ,all 16 cores also are busy. Before,I thought it was that one request for a core deal with it.

ben-8878 commented 3 years ago

@nshmyrev I umount all ossfs disk and start asr server on linux

python -c '''import os
> print(os.cpu_count())
> '''
16
  1. send serial request :
    import asyncio
    import os
    import sys
    from time import *
    #from test_words import hello
    from multiprocessing.dummy import Pool
    def recognize(wavpath):
    os.system("python asr_test.py {}".format(wavpath))
    if __name__ == '__main__':
    base_dir = sys.argv[1]
    wavlist = []
    for file_name in os.listdir(base_dir):
        #print(file_name)
        if file_name.split('.')[-1:][0] != 'wav':
            continue
        audio_file_path = os.path.join(base_dir, file_name)
        #wavlist.append(audio_file_path)
        recognize(audio_file_path)

    it took 1m45s

  2. send parallel request
    import asyncio
    import os
    import sys
    from time import *
    #from test_words import hello
    from multiprocessing.dummy import Pool
    def recognize(wavpath):
    os.system("python asr_test.py {}".format(wavpath))
    if __name__ == '__main__':
    base_dir = sys.argv[1]
    wavlist = []
    for file_name in os.listdir(base_dir):
        #print(file_name)
        if file_name.split('.')[-1:][0] != 'wav':
            continue
        audio_file_path = os.path.join(base_dir, file_name)
        wavlist.append(audio_file_path)
    p = Pool(16)
    p.map(recognize, wavlist)

    it took 2m14s

nshmyrev commented 3 years ago

I just send one request ,all 16 cores also are busy.

Did you build kaldi/vosk yourself? Or do you use prebuilt image?

Maybe you have somehow enabled blas threading.

ben-8878 commented 3 years ago

I just send one request ,all 16 cores also are busy.

Did you build kaldi/vosk yourself? Or do you use prebuilt image?

Maybe you have somehow enabled blas threading.

I try my building vosk and yours published vosk 0.3.2.1, all met the same situation. I don't know whether I have enabled blas threading and how to close the blas threading.

ben-8878 commented 3 years ago

@nshmyrev thanks, threading really is enable due to MKL, and this explained that why the speed of pykaldi loading model was faster than vosk loading model(from issue: https://github.com/alphacep/vosk-api/issues/566#issuecomment-854725992). vosk is the best !