Stuck on "Computing derived variables for iVector extracto" | model loading stuck in GPU

mehadi92 commented 1 year ago

Hi, I'm trying to run this example script

https://github.com/alphacep/vosk-server/blob/master/websocket-gpu-batch/asr_server_gpu.py

To do this I pull this docker image

https://github.com/alphacep/vosk-server/blob/master/docker/Dockerfile.kaldi-en-gpu

But it stuck like this

WARNING ([5.5.1027~1-59386]:SelectGpuId():cu-device.cc:243) Not in compute-exclusive mode.  Suggestion: use 'nvidia-smi -c 3' to set compute exclusive mode
LOG ([5.5.1027~1-59386]:SelectGpuIdAuto():cu-device.cc:438) Selecting from 1 GPUs
LOG ([5.5.1027~1-59386]:SelectGpuIdAuto():cu-device.cc:453) cudaSetDevice(0): Tesla T4  free:14993M, used:116M, total:15109M, free/total:0.992323
LOG ([5.5.1027~1-59386]:SelectGpuIdAuto():cu-device.cc:501) Device: 0, mem_ratio: 0.992323
LOG ([5.5.1027~1-59386]:SelectGpuId():cu-device.cc:382) Trying to select device: 0
LOG ([5.5.1027~1-59386]:SelectGpuIdAuto():cu-device.cc:511) Success selecting device 0 free mem ratio: 0.992323
LOG ([5.5.1027~1-59386]:FinalizeActiveGpu():cu-device.cc:338) The active GPU is [0]: Tesla T4   free:14627M, used:482M, total:15109M, free/total:0.9681 version 7.5
LOG ([5.5.1027~1-59386]:RemoveOrphanNodes():nnet-nnet.cc:948) Removed 0 orphan nodes.
LOG ([5.5.1027~1-59386]:RemoveOrphanComponents():nnet-nnet.cc:847) Removing 0 orphan components.
LOG ([5.5.1027~1-59386]:BatchModel():batch_model.cc:52) Loading HCLG from model/graph/HCLG.fst
LOG ([5.5.1027~1-59386]:BatchModel():batch_model.cc:56) Loading words from model/graph/words.txt
LOG ([5.5.1027~1-59386]:BatchModel():batch_model.cc:64) Loading winfo model/graph/phones/word_boundary.int
LOG ([5.5.1027~1-59386]:ComputeDerivedVars():ivector-extractor.cc:183) Computing derived variables for iVector extractor
LOG ([5.5.1027~1-59386]:ComputeDerivedVars():ivector-extractor.cc:204) Done.
LOG ([5.5.1027~1-59386]:Optimize():nnet-optimize.cc:507) Before optimization, max memory use (bytes) = 125192960
LOG ([5.5.1027~1-59386]:Optimize():nnet-optimize.cc:629) After optimization, max memory use (bytes) = 5079040
VLOG[3] ([5.5.1027~1-59386]:PrintDiagnostics():online-ivector-feature.cc:359) Processed no data.
LOG ([5.5.1027~1-59386]:ComputeDerivedVars():ivector-extractor.cc:183) Computing derived variables for iVector extractor
LOG ([5.5.1027~1-59386]:ComputeDerivedVars():ivector-extractor.cc:204) Done.
LOG ([5.5.1027~1-59386]:ComputeDerivedVars():ivector-extractor.cc:183) Computing derived variables for iVector extractor
LOG ([5.5.1027~1-59386]:ComputeDerivedVars():ivector-extractor.cc:204) Done.
LOG ([5.5.1027~1-59386]:ComputeDerivedVars():ivector-extractor.cc:183) Computing derived variables for iVector extractor
LOG ([5.5.1027~1-59386]:ComputeDerivedVars():ivector-extractor.cc:204) Done.

Is there anything that I missing

nshmyrev commented 1 year ago

It should be waiting for input, try to run the client app

mehadi92 commented 1 year ago

Hi, @nshmyrev When I run this client app https://github.com/alphacep/vosk-server/blob/master/websocket-gpu-batch/test.py I'm getting this error.

python test.py 
Traceback (most recent call last):
  File "test.py", line 26, in <module>
    asyncio.run(run_test('ws://localhost:2700'))
  File "/opt/conda/lib/python3.7/asyncio/runners.py", line 43, in run
    return loop.run_until_complete(main)
  File "/opt/conda/lib/python3.7/asyncio/base_events.py", line 587, in run_until_complete
    return future.result()
  File "test.py", line 9, in run_test
    async with websockets.connect(uri) as websocket:
  File "/opt/conda/lib/python3.7/site-packages/websockets/legacy/client.py", line 642, in __aenter__
    return await self
  File "/opt/conda/lib/python3.7/site-packages/websockets/legacy/client.py", line 659, in __await_impl_timeout__
    return await asyncio.wait_for(self.__await_impl__(), self.open_timeout)
  File "/opt/conda/lib/python3.7/asyncio/tasks.py", line 442, in wait_for
    return fut.result()
  File "/opt/conda/lib/python3.7/site-packages/websockets/legacy/client.py", line 663, in __await_impl__
    _transport, _protocol = await self._create_connection()
  File "/opt/conda/lib/python3.7/asyncio/base_events.py", line 971, in create_connection
    ', '.join(str(exc) for exc in exceptions)))
OSError: Multiple exceptions: [Errno 111] Connect call failed ('::1', 2700, 0, 0), [Errno 111] Connect call failed ('127.0.0.1', 2700)

I also the docker ps -a and it gives me the below, which ensures the container is running in the desire port

CONTAINER ID   IMAGE            COMMAND                   CREATED                   STATUS                  PORTS      NAMES
16e2656686d9   58fbe8ccdeb5  "python3 ./asr_serve…"   About a minute ago   Up About a minute 2700/tcp   amazing_williams

nshmyrev commented 1 year ago

Since in the other issue you report about non-deterministic output were you able to run the server with different card or what?

mehadi92 commented 1 year ago

@nshmyrev I'm able to run using this script https://github.com/alphacep/vosk-api/blob/master/python/example/test_gpu_batch.py

mehadi92 commented 1 year ago

Hi, After using dockerfile like

FROM alphacep/kaldi-vosk-server-gpu:latest

ENV MODEL_VERSION 0.22

COPY <my GPU models> /app/model

COPY src ./app

EXPOSE 2700
WORKDIR app
CMD [ "python3", "./asr_server_gpu.py" ]

and docker-compose

version: '2'

services:
    gpu-server:
        build: ./
        container_name: gpu-server
        image: gpu-server
        ports:
            - 2700:2700
        runtime: nvidia
        environment:
            - CUDA_VISIBLE_DEVICES=0
        shm_size: '7gb'
        ulimits:
            memlock: -1
            stack: 67108864
        network_mode: host

Run the compose using

docker compose up

My issue is solved

alphacep / vosk-server

Stuck on "Computing derived variables for iVector extracto" | model loading stuck in GPU #208