SEPIA Speech-To-Text (STT) Server is a WebSocket based, full-duplex Python server for realtime automatic speech recognition (ASR) supporting multiple open-source ASR engines. It can receive a stream of audio chunks via the secure WebSocket connection and return transcribed text almost immediately as partial and final results.
One goal of this project is to offer a standardized, secure, realtime interface for all the great open-source ASR tools out there. The server works on all major platforms including single-board devices like Raspberry Pi (4).
NOTE: This is a complete rewrite (2021) of the original STT Server (2018). Code of the old version has been moved to the LEGACY SERVER folder.
If you are using custom models built for the 2018 version you can easily convert them to new models (please ask for details via the issues section).
If you want to see additional engines please create a new issue. Pull requests are welcome ;-)
The easiest way to get started is to use a Docker container for your platform. To install the server yourself please see the code section README or scripts section.
Simply pull the latest image (or choose an older one from the archive). The smallest English and German Vosk models and an English Coqui model (w/o scorer) are included:
docker pull sepia/stt-server:latest
Supported platforms:
After the download is finished you can start the container like this:
sudo docker run --rm --name=sepia-stt -p 20741:20741 -it sepia/stt-server:latest
To test the server visit: http://localhost:20741
if you are on the same machine or http://[server-IP]:20741
if you are in the same network (NOTE: custom recordings via microphone will only work using localhost or a HTTPS URL!).
Alternatively you can use the python-client to test your server.
Currently the server supports Vosk ASR models, Coqui-STT models and custom models (see "adapt" section below).
To add new ASR models create a shared volume for your container, place your model inside and update the server config file. The "adapt" section below has a more detailed example, but basically you can:
-v [host-models-folder]:/home/admin/sepia-stt/models/my
(Note: use absolute path!)path3=my/vosk-model-small-es
, lang3=es-ES
, engine3=vosk
and optionally a "task" like task3=smart-home
--env SEPIA_STT_SETTINGS=/home/admin/sepia-stt/models/my/server.conf
Included inside the Docker containers are:
Most of the settings can be handled easily via the server.conf settings file. Please check out the file to see whats possible.
ENV variables:
SEPIA_STT_SETTINGS
: Overwrites default path to settings fileCommand line options:
python -m launch -h
to see all command line optionspython -m launch -s [path-to-file]
to use custom settingsNOTE: Command line options always overrule the settings file but in most scenarios it makes sense to simply create a new settings file and use the -s
flag.
As soon as the server is running you can check the current setup via the HTTP REST interface: http://localhost:20741/settings
or the test page (see quick-start above).
Individual settings for the active engine can be changed on-the-fly during the WebSocket 'welcome' event. See the API docs file for more info or check out the 'Engine Settings' section of the test page.
The SEPIA Client will support the new STT server out-of-the-box from version 0.24.0 on.
Simply open the client's settings, look for 'ASR engine (STT)' and select SEPIA
. The server address will be set automatically relative to your SEPIA Server host.
If your SEPIA server proxy has not been updated yet to forward requests to the SEPIA STT-Server you can enter the direct URL via the STT settings page, e.g.: http://localhost:20741
or http://localhost:20726/sepia/stt
.
The settings will allow you to select a specific ASR model for each client language as well (if you don't want to use the language defaults set by your STT server config).
NOTE: Keep in mind that the client's microphone will only work in a secure environment (that is localhost or HTTPS) and thus the link to your server must be secure as well (e.g. use a real domain and SSL certificate, self-signed SSL or a proxy running on localhost).
See the separate API docs file or check out the Javascript client class, the HTML test page or the python-client source-code.
Demo clients:
http://localhost:20741
(with microphone) or http://[server-IP]:20741
(no microphone due to "insecure" origin)Open-source ASR has improved a lot in the last years but sometimes it makes sense to adapt the models to your own, specific use-case/domain and vocabulary to improve accuracy. Language model adaptation via web GUI is planned for the near future. Until then please check out the following link:
Before you continue please read the basics about custom model creation on kaldi-adapt-lm if you haven't already. You should at least understand what the 'lmcorpus' folder does and have a 'sentences[lang].txt' ([lang] e.g.: en, de) ready in your language ;-).
If you use one of the newer Docker images (>=August 2021) 'kaldi-adapt-lm' is already integrated and ready for action. You just need to adjust your Docker start command a bit:
-v [host-models-folder]:/home/admin/sepia-stt/models/my
([host-models-folder] e.g.: /home/pi/stt/models)-v [host-share-folder]:/home/admin/sepia-stt/share
([host-share-folder] e.g.: /home/pi/stt/share)--env SEPIA_STT_SETTINGS=/home/admin/sepia-stt/share/my.conf
/bin/bash
at the end to enter the terminal and access 'kaldi-adapt-lm' instead of starting the STT server right awayThe result should look like this:
sudo docker run --rm --name=sepia-stt -p 20741:20741 -it \
-v [host-models-folder]:/home/admin/sepia-stt/models/my \
-v [host-share-folder]:/home/admin/sepia-stt/share \
--env SEPIA_STT_SETTINGS=/home/admin/sepia-stt/share/my.conf \
sepia/stt-server:latest \
/bin/bash
Don't start the container yet! First copy your own LM corpus (e.g.: sentences_en.txt) and optionally LM dictionary (e.g.: my_dict_en.txt) to your shared folder on the host machine ([host-share-folder]).
When you are ready do the following:
cp /home/admin/sepia-stt/share/sentences_*.txt /home/admin/kaldi-adapt-lm/lm_corpus/
and cp /home/admin/sepia-stt/share/my_dict_*.txt /home/admin/kaldi-adapt-lm/lm_dictionary/
.cd /home/admin/kaldi-adapt-lm
and run the adaptation process (more info: kaldi-adapt-lm):
bash 2-download-model.sh [lang]
.bash 3-adapt.sh [lang] checkVocab optimizeLm
. If there is missing vocabulary in your LM you will get a note right away, if not prepare to wait for a while ^^.bash 4a-build-vosk-model.sh
and finally bash 5-clean-up.sh
.adapted_model.zip
containing your custom model.unzip -d /home/admin/sepia-stt/models/my/custom-v1-en/ adapted_model.zip
Finally we need to tell the server where to find the new model:
cp /home/admin/sepia-stt/server/server.conf /home/admin/sepia-stt/share/my.conf
.nano /home/admin/sepia-stt/share/my.conf
and add the fields path3=my/custom-v1-en
and lang3=en-US
in the [app]
section (adjust path and language as required).cd /home/admin
and bash on-docker.sh
.exit
.To use the new model in "production" don't forget to start your Docker container with the -v
and --env
modifications from now on (drop the '/bin/bash' if you just run the server).