georgepar / kaldi-grpc-server

Deploy Kaldi models using grpc for bidirectional streaming.
Apache License 2.0
17 stars 6 forks source link

Kaldi gRPC Server

This is a modern alternative for deploying Speech Recognition models developed using Kaldi.

Features:

Getting started

Kaldi model structure

We recommend the following structure for the deployed model

model
├── conf
│   ├── ivector_extractor.conf
│   ├── mfcc.conf
│   ├── online_cmvn.conf
│   ├── online.conf
│   └── splice.conf
├── final.mdl
├── global_cmvn.stats
├── HCLG.fst
├── ivector_extractor
│   ├── final.dubm
│   ├── final.ie
│   ├── final.mat
│   ├── global_cmvn.stats
│   ├── online_cmvn.conf
│   ├── online_cmvn_iextractor
│   └── splice_opts
└── words.txt

The key files / directories are:

Build Binary ASR recognizer (singularity)

We provide the option to build a (for all intents and purposes) binary file using the kaldi bindings through singularity containers. In short, singularity containers build a fakeroot filesystem into a single, executable file. For more info check the documentation.

Instructions:

Note: You can also use the command make build-flex-singularity so that the singularity container does not include / expect the model at build time, in order to build a more flexible container that can run any local model. Then you can do something like

./containers/asr.sif --model_dir=$MY_LOCAL_MODEL --wav=$MYTEST.wav
./containers/asr.sif --model_dir=$MY_OTHER_LOCAL_MODEL --wav=$MYTEST.wav

Dockerized server deployment

Once you create this model structure, you can use the provided Dockerfile to build the server container. Run:

make build-server kaldi_model=$MY_MODEL_DIR image_tag=$CONTAINER_TAG
# example: make build-server kaldi_model=/models/kaldi/english_model image_tag=kaldigrpc:en-latest

And you can run the container

# Run your container for maximum 3 simultaneous clients on port 1234
make run-server image_tag=kaldigrpc:en-latest max_workers=3 server_port=1234

Client usage

Install client library:

pip install kaldigrpc-client

Run client from command line:

kaldigrpc-transcribe --streaming --host localhost --port 50051 mytest.wav

For more infomation refer to client/README.md

RNNLM Rescoring

TODO: Write documentation

Roadmap