MycroftAI / mimic-recording-studio

Mimic Recording Studio is a Docker-based application you can install to record voice samples, which can then be trained into a TTS voice with Mimic2
Apache License 2.0
500 stars 116 forks source link
docker hacktoberfest microphone mimic mycroft mycroftai recording-studio tacotron tts tts-engine voice

Mimic Recording Studio

demo

The Mycroft open source Mimic technologies are Text-to-Speech engines which take a piece of written text and convert it into spoken audio. The latest generation of this technology, Mimic 2, uses machine learning techniques to create a model which can speak a specific language, sounding like the voice on which it was trained.

The Mimic Recording Studio simplifies the collection of training data from individuals, each of which can be used to produce a distinct voice for Mimic.

Software Quick Start

Windows self-hosted Quick Start

Linux/Mac self-hosted Quick Start

Install Dependencies

Why docker? To make this super easy to set up and run cross platforms.

Build and Run

Note: The first execution of docker-compose up will take a while as this command will also build the docker containers. Subsequent executions of docker-compose up should be quicker to boot.

Manual Install, Build and Start

Backend

Dependencies
Build & Run

Frontend

Dependencies
Build & Run

Coming soon!

Online, http://mimic.mycroft.ai hosted version requiring zero setup.

Data

Audio Recordings

WAV files

Audio is saved as WAV files to the backend/audio_file/{uuid}/ directory. The backend automatically trims the beginning and ending silence for all WAV files using ffmpeg.

{uuid}-metadata.txt

Metadata is also saved to backend/audio_file/{uuid}/. This file maps the WAV file name to the phrase spoken. This along with the WAV files are what you needed to get started on training Mimic 2.

Corpus

For now, we have an English corpus, english_corpus.csv made available which can be found in backend/prompt/. To use your own corpus follow these steps.

  1. Create a csv file in the same format as english_corpus.csv using tabs (\t) as the delimiter.
  2. Make sure there are no empty lines in the corpus
  3. Add your corpus to the backend/prompt directory.
  4. Change the CORPUS environment variable in docker-compose.yml to your corpus name.

Corpora in other languages

If you wish to develop a corpus in a language other than English, then Mimic Recording Studio can be used to produce voice recordings for TTS voices in additional languages. If you are building a corpus in a language other than English, we encourage you to choose phrases which:

IMPORTANT: For now, you must reset the sqlite database to use a new corpus. If you've recorded on another corpus and would like to save that data, you can simply rename your sqlite db found in backend/db/ to another name. The backend will detect that mimicstudio.db is not there and create a new one for you. You may continue recording data for your new corpus.

Technologies

Frontend

The web UI is built using Javascript and React and create-react-app as a scaffolding tool. Refer to CRA.md to find out more on how to use create-react-app.

Functions

Backend

The web service is built using Python, Flask as the backend framework, gunicorn as a http webserver, and sqlite as the database.

Functions

Docker

Docker is used to containerize both applications. By default, the frontend uses network port 3000 while the backend uses networking port 5000. You can configure these in the docker-compose.yml file.

NOTE: If you are running docker-registry, this runs by default on port 5000, so you will need to change which port you use.

Recording Tips

Creating a voice requires an achievable, but significant effort. An individual will need to record 15,000 - 20,000 phrases. In order to get the best possible Mimic voice, the recordings need to be clean and consistent. To that end, follow these recommendations:

Advanced

Query database structure

Mimic-Recording-Studio writes all recordings in a sqlite database file located under /backend/db/. This can be opened with database tools like DBeaver.

The database includes two tables.

database_table_overview

Table "audiomodel"

All recordings are persisted in this table with

The database can be used to query your recordings.

Here are some example queries:

-- List all recordings
SELECT * FROM audiomodel;

-- Lists recordings from january 2020 order by phrase
SELECT * FROM audiomodel WHERE created_date BETWEEN '2020-01-01' AND '2020-01-31' ORDER BY prompt;

-- Lists number of recordings per day
SELECT DATE(created_date), COUNT(*) AS RecordingsPerDay
FROM audiomodel
GROUP BY DATE(created_date )
ORDER BY DATE(created_date)

-- Shows average text length of recordings
SELECT AVG(LENGTH(prompt)) AS avgLength FROM audiomodel

There are many ways that querying the sqlite database might be useful. For example, looking for recordings in a specific time range might help to remove recordings made in a bad environment.

Table "usermodel"

Mimic-Recording-Studio can be used by more than one speaker using the same sqlite database file.

This tables provides following informations per speaker:

These values are used to calculate metrics. For example, the speaking pace may show if the recorded phrase is too fast or slow compared to previous recordings.

Query table "usermodel" to get a list of speakers including uuid and some recording statistics on them.

SELECT user_name AS [name], uuid FROM usermodel;

database_table_usermodel

Modify recorder uuid

The browser used to record your phrases persists the users uuid and name in it's localStorage to keep it synchronous with sqlite and filesystem.

If a problem occurs and your browser looses/changes uuid mapping for Mimic-Recording-Studio you could have difficulties to continue a previous recording session. Then update the following two attributes in localStorage of your browser:

Open Mimic-Recording-Studio in your browser, jump to web-developer options, localStorage and set name and uuid to the original values.

browser_local_storage

After that you should be able to continue your previous recording session without further problems.

Providing your recording to Mycroft for training

We welcome your voice donations to Mycroft for use in Text-to-Speech applications. If you would like to provide your voice recordings, you must license them to us under the Creative Commons CC0 Public Domain license so that we can utilise them in TTS voices - which are derivative works. If you're ready to donate your voice recordings, email us at hello@mycroft.ai.

Contributions

PR's are gladly accepted!

Where to get support and assistance

You can get help and support with Mimic Recording Studio at;