TTS0010: piper tts training

gangagyatso4364 commented 2 weeks ago

Dataset Format

The pre-processing script expects data to be a directory with:

metadata.csv - CSV file with text, audio filenames, and speaker names
wav/ - directory with audio files

The metadata.csv file uses | as a delimiter, and has 2 or 3 columns depending on if the dataset has a single or multiple speakers. There is no header row.

For single speaker datasets:

id|text

where id is the name of the WAV file in the wav directory. For example, an id of 1234 means that wav/1234.wav should exist.

For multi-speaker datasets:

id|speaker|text

where speaker is the name of the utterance's speaker. Speaker ids will automatically be assigned based on the number of utterances per speaker (speaker id 0 has the most utterances).

Link to the language which is by default espeak-ng - https://github.com/espeak-ng/espeak-ng/blob/master/docs/languages.md

Sub Tasks:

[x] Install piper tts and its suitable dependencies
[x] Data processing - conversion of data to the required format
[x] Use a preprocessing script provided by Piper or create one that reads the CSV and prepares the dataset for training
[ ]

tenzinchoedon commented 2 weeks ago

piper-phonemize==1.1.0, which is not available for macOS, leading to the error message stating that no matching distribution was found.

Current Situation

Dependency Issue: The piper-tts library requires piper-phonemize==1.1.0, but this version does not have a compatible wheel for macOS. Version Availability: The latest version of piper-phonemize available on PyPI is indeed 1.1.0, but it lacks macOS-compatible distributions, which is causing the installation failure

Solution

To go ahead with piper-phonemize-cross which has wheel for macos

ta4tsering commented 2 weeks ago

talk with @10zinten about the data.distribution for the TTS that he used when trained TTS model.

tenzinchoedon commented 2 weeks ago

Link to the dataset - https://huggingface.co/datasets/openpecha/tts-sherab

OpenPecha / tts-training