This is a GRPC based service which also supports streaming for realtime inferencing. Proto file for grpc is available at proto/speech-recognition-open-api.proto. It has tree endpoints
Endpoint | Purpose |
---|---|
recognize_audio | Streaming Endpoint. |
punctuate | Punctuation endpoint for a given text. |
recognize | Inferencing from a audio URL or bytes. |
Download asr models and punctuation models. Thereafter, update the right asr model paths in model_dict.json. Docker image expects all the models and config to be available at directory /opt/speech_recognition_open_api/deployed_models/
.
Sample deployed_models
directory structure
|-- gujarati
| |-- dict.ltr.txt
| `-- gujarati.pt
|-- hindi
| |-- dict.ltr.txt
| |-- hindi.pt
| |-- lexicon.lst
| `-- lm.binary
|-- hinglish
| |-- dict.ltr.txt
| |-- hinglish_CLSRIL.pt
| |-- lexicon.lst
| `-- lm.binary
|-- model_dict.json
|-- model_data
| |-- albert_metadata
| | |-- config.json
| | |-- pytorch_model.bin
| | |-- spiece.model
| | `-- spiece.vocab
| |-- as.json
| |-- as.pt
| |-- as_dict.json
| |-- bn.json
| |-- bn.pt
| |-- bn_dict.json
| |-- denoiser
| | `-- denoiser_dns48.pth #Denoiser checkout details are https://github.com/Open-Speech-EkStep/denoiser
ASR Models
All the open-source language models are available at bucket https://console.cloud.google.com/storage/browser/vakyansh-open-models/models/. You can find more details about the same at https://github.com/Open-Speech-EkStep/vakyansh-models#language-models-works-with-finetuned-asr-models.
Punctuation models:
Punctuation models are stored inside model_data
directory of deployed_models
directory and then download all punc models. Details of open sourced punctuation models are available https://github.com/Open-Speech-EkStep/vakyansh-models#punctuation-models.
gu/:
wget https://storage.googleapis.com/vakyansh-open-models/punctuation_models/gu/gu.json .
wget https://storage.googleapis.com/vakyansh-open-models/punctuation_models/gu/gu.pt .
wget https://storage.googleapis.com/vakyansh-open-models/punctuation_models/gu/gu_dict.json .
hi/:
wget https://storage.googleapis.com/vakyansh-open-models/punctuation_models/hi/hi.json .
wget https://storage.googleapis.com/vakyansh-open-models/punctuation_models/hi/hi.pt .
wget https://storage.googleapis.com/vakyansh-open-models/punctuation_models/hi/hi_dict.json .
Once all the required models are placed model_dict.json
file should be updated with relative paths to the asr model artifacts.
For eg, if the asr models are placed in the directory /asr-models/,then model_dict.json would be like,
{
"en": {
"path": "/asr-models/indian_english/final_model.pt",
"enablePunctuation": true,
"enableITN": true
},
"hi": {
"path": "/asr-models/hindi/hindi.pt",
"enablePunctuation": true,
"enableITN": true
}
}
Using pre-built docker image:
We have pre-built images hosted on gcr.io/ekstepspeechrecognition/speech_recognition_model_api
. You can use these images directly to run on docker.
Note: We do not follow latest
tag, so you have to specify exact version.
docker run -itd -p 50051:50051 --env gpu=True --env languages=['en','hi'] --gpus all -v <Location for deployed_models directory>:/opt/speech_recognition_open_api/deployed_models/ gcr.io/ekstepspeechrecognition/speech_recognition_model_api:3.2.33
We have python and java client examples available in this repo which can be used to test.
pyhton secure example can be run from examples/python/speech-recognition
python main.py
For insecure example
python main-insecure.py
This is a GRPC service you can call it using any GRPC supported client. Complete details of request/response schema can be found in ULCA Schema.
Proto file for the GRPC service is available at [proto/speech-recognition-open-api.proto] (proto/speech-recognition-open-api.proto)
Popular GRPC Clients
API supported two transcription formats:
Sample request for ASR with audio URL
{
"config": {
"language": {
"sourceLanguage": "hi"
},
"transcriptionFormat": {
"value": "transcript"
},
"audioFormat": "wav",
"punctuation": true,
"enableInverseTextNormalization": true
},
"audio": [
{
"audioUri": "https://storage.googleapis.com/test_public_bucket/srt_test.wav"
}
]
}
Sample request for ASR with Audio bytes
{
"config": {
"language": {
"sourceLanguage": "hi"
},
"transcriptionFormat": {
"value": "transcript"
},
"audioFormat": "wav",
"punctuation": true,
"enableInverseTextNormalization": true
},
"audio": [
{
"audioContent": "<Audio Bytes>"
}
]
}
Sample ASR Response
{
"status": "SUCCESS",
"output": [
{
"source": "मैं भारत देश का निवासी हूँ"
}
]
}
Sample request in grpcurl
grpcurl -import-path <directory to proto file> -proto speech-recognition-open-api.proto -plaintext -d @ localhost:50051 ekstep.speech_recognition.SpeechRecognizer.recognize <<EOM
{
"config": {
"language": {
"sourceLanguage": "hi"
},
"transcriptionFormat": {
"value": "transcript"
},
"audioFormat": "wav",
"punctuation": true,
"enableInverseTextNormalization": true
},
"audio": [
{
"audioUri": "https://storage.googleapis.com/test_public_bucket/srt_test.wav"
}
]
}
EOM
Realtime streaming can be supported directly using GRPC. If you need something to work on browser, we have a socket.io based implementation. Refer the documentation
As this is a GRPC service, all the endpoints are defined in proto file. Once you made changes into proto file you need to generate stubs for it.
To generate stub files from .proto file, using the following command:
python3 -m grpc_tools.protoc \
--include_imports \
--include_source_info \
--proto_path=./proto \
./proto/google/api/http.proto \
./proto/google/api/annotations.proto \
./proto/google/protobuf/descriptor.proto \
-I ./proto \
--descriptor_set_out=./proto/api_descriptor.pb \
--python_out=./stub \
--grpc_python_out=./stub \
./proto/speech-recognition-open-api.proto
py.test --grpc-fake-server --ignore=wav2letter --ignore=wav2vec-infer --ignore=kenlm
pip3 install pytest-cov
pytest --cov=src tests/ --grpc-fake-server
We build this app in two steps to expedite the process of changes in the main source. We build a dependency image for which you can find dependency docker image file at dependencies/Dockerfile.
Using dependency image, we build the main images which are published. Docker file for this step is available here. You can use these steps to recreate the bundle. We recommend using some environment manager like conda.
You can follow the steps from dependency docker image dependencies/Dockerfile and main docker image file. After setup the directory and installing all the prerequisites, yu can run server.py
file to start grpc server. Default port of the GRPC server is 50051 which can be changed from server.py.
python server.py
Variable Name | Default Value | Description |
---|---|---|
log_level | DEBUG | Log level for application logs |
gpu | True | True: Load the models on GPU, False: Use CPU |
model_logs_base_path | /opt/speech_recognition_open_api/deployed_models/logs/ | Location for language model folders. |
TRANSFORMERS_CACHE | /opt/speech_recognition_open_api/deployed_models/model_data/transformers_cache/ | Transformers cache location for punctuation |
DENOISER_MODEL_PATH | /opt/speech_recognition_open_api/deployed_models/model_data/denoiser/denoiser_dns48.pth | Denoiser checkpoint. Refer https://github.com/Open-Speech-EkStep/denoiser |