alphacep / vosk-server

WebSocket, gRPC and WebRTC speech recognition server based on Vosk and Kaldi libraries
Apache License 2.0
869 stars 241 forks source link

Stuck on "Computing derived variables for iVector extracto" | model loading stuck in GPU #208

Closed mehadi92 closed 1 year ago

mehadi92 commented 1 year ago

Hi, I'm trying to run this example script

To do this I pull this docker image

But it stuck like this

WARNING ([5.5.1027~1-59386]:SelectGpuId():cu-device.cc:243) Not in compute-exclusive mode.  Suggestion: use 'nvidia-smi -c 3' to set compute exclusive mode
LOG ([5.5.1027~1-59386]:SelectGpuIdAuto():cu-device.cc:438) Selecting from 1 GPUs
LOG ([5.5.1027~1-59386]:SelectGpuIdAuto():cu-device.cc:453) cudaSetDevice(0): Tesla T4  free:14993M, used:116M, total:15109M, free/total:0.992323
LOG ([5.5.1027~1-59386]:SelectGpuIdAuto():cu-device.cc:501) Device: 0, mem_ratio: 0.992323
LOG ([5.5.1027~1-59386]:SelectGpuId():cu-device.cc:382) Trying to select device: 0
LOG ([5.5.1027~1-59386]:SelectGpuIdAuto():cu-device.cc:511) Success selecting device 0 free mem ratio: 0.992323
LOG ([5.5.1027~1-59386]:FinalizeActiveGpu():cu-device.cc:338) The active GPU is [0]: Tesla T4   free:14627M, used:482M, total:15109M, free/total:0.9681 version 7.5
LOG ([5.5.1027~1-59386]:RemoveOrphanNodes():nnet-nnet.cc:948) Removed 0 orphan nodes.
LOG ([5.5.1027~1-59386]:RemoveOrphanComponents():nnet-nnet.cc:847) Removing 0 orphan components.
LOG ([5.5.1027~1-59386]:BatchModel():batch_model.cc:52) Loading HCLG from model/graph/HCLG.fst
LOG ([5.5.1027~1-59386]:BatchModel():batch_model.cc:56) Loading words from model/graph/words.txt
LOG ([5.5.1027~1-59386]:BatchModel():batch_model.cc:64) Loading winfo model/graph/phones/word_boundary.int
LOG ([5.5.1027~1-59386]:ComputeDerivedVars():ivector-extractor.cc:183) Computing derived variables for iVector extractor
LOG ([5.5.1027~1-59386]:ComputeDerivedVars():ivector-extractor.cc:204) Done.
LOG ([5.5.1027~1-59386]:Optimize():nnet-optimize.cc:507) Before optimization, max memory use (bytes) = 125192960
LOG ([5.5.1027~1-59386]:Optimize():nnet-optimize.cc:629) After optimization, max memory use (bytes) = 5079040
VLOG[3] ([5.5.1027~1-59386]:PrintDiagnostics():online-ivector-feature.cc:359) Processed no data.
LOG ([5.5.1027~1-59386]:ComputeDerivedVars():ivector-extractor.cc:183) Computing derived variables for iVector extractor
LOG ([5.5.1027~1-59386]:ComputeDerivedVars():ivector-extractor.cc:204) Done.
LOG ([5.5.1027~1-59386]:ComputeDerivedVars():ivector-extractor.cc:183) Computing derived variables for iVector extractor
LOG ([5.5.1027~1-59386]:ComputeDerivedVars():ivector-extractor.cc:204) Done.
LOG ([5.5.1027~1-59386]:ComputeDerivedVars():ivector-extractor.cc:183) Computing derived variables for iVector extractor
LOG ([5.5.1027~1-59386]:ComputeDerivedVars():ivector-extractor.cc:204) Done.

Is there anything that I missing

nshmyrev commented 1 year ago

It should be waiting for input, try to run the client app

mehadi92 commented 1 year ago

Hi, @nshmyrev When I run this client app https://github.com/alphacep/vosk-server/blob/master/websocket-gpu-batch/test.py I'm getting this error.

python test.py 
Traceback (most recent call last):
  File "test.py", line 26, in <module>
    asyncio.run(run_test('ws://localhost:2700'))
  File "/opt/conda/lib/python3.7/asyncio/runners.py", line 43, in run
    return loop.run_until_complete(main)
  File "/opt/conda/lib/python3.7/asyncio/base_events.py", line 587, in run_until_complete
    return future.result()
  File "test.py", line 9, in run_test
    async with websockets.connect(uri) as websocket:
  File "/opt/conda/lib/python3.7/site-packages/websockets/legacy/client.py", line 642, in __aenter__
    return await self
  File "/opt/conda/lib/python3.7/site-packages/websockets/legacy/client.py", line 659, in __await_impl_timeout__
    return await asyncio.wait_for(self.__await_impl__(), self.open_timeout)
  File "/opt/conda/lib/python3.7/asyncio/tasks.py", line 442, in wait_for
    return fut.result()
  File "/opt/conda/lib/python3.7/site-packages/websockets/legacy/client.py", line 663, in __await_impl__
    _transport, _protocol = await self._create_connection()
  File "/opt/conda/lib/python3.7/asyncio/base_events.py", line 971, in create_connection
    ', '.join(str(exc) for exc in exceptions)))
OSError: Multiple exceptions: [Errno 111] Connect call failed ('::1', 2700, 0, 0), [Errno 111] Connect call failed ('127.0.0.1', 2700)

I also the docker ps -a and it gives me the below, which ensures the container is running in the desire port

CONTAINER ID   IMAGE            COMMAND                   CREATED                   STATUS                  PORTS      NAMES
16e2656686d9   58fbe8ccdeb5  "python3 ./asr_serve…"   About a minute ago   Up About a minute 2700/tcp   amazing_williams
nshmyrev commented 1 year ago

Since in the other issue you report about non-deterministic output were you able to run the server with different card or what?

mehadi92 commented 1 year ago

@nshmyrev I'm able to run using this script https://github.com/alphacep/vosk-api/blob/master/python/example/test_gpu_batch.py

mehadi92 commented 1 year ago

Hi, After using dockerfile like

FROM alphacep/kaldi-vosk-server-gpu:latest

ENV MODEL_VERSION 0.22

COPY <my GPU models> /app/model

COPY src ./app

EXPOSE 2700
WORKDIR app
CMD [ "python3", "./asr_server_gpu.py" ]

and docker-compose

version: '2'

services:
    gpu-server:
        build: ./
        container_name: gpu-server
        image: gpu-server
        ports:
            - 2700:2700
        runtime: nvidia
        environment:
            - CUDA_VISIBLE_DEVICES=0
        shm_size: '7gb'
        ulimits:
            memlock: -1
            stack: 67108864
        network_mode: host

Run the compose using

docker compose up

My issue is solved