devidw / dswav

The Unlicense
15 stars 0 forks source link

dswav

Tool to build dataset for audio model training

Includes a series of helpers for dataset work, such as:

Mostly focused around tooling for StyleTTS2 datasets, but can also be used for other kinds of models / libraries such as coqui

Usage

docker run \
  -p 7860:7860 \
  -v ./projects:/app/projects \
  ghcr.io/devidw/dswav:main

TTS, LJSpeech

https://tts.readthedocs.io/en/latest/formatting_your_dataset.html

Supports output in LJSpeech dataset format (metadata.csv, wavs/) that can be used in the TTS py pkg to train models such as xtts2

StyleTTS2

https://github.com/yl4579/StyleTTS2

Also supports output format for StyleTTS2

Data sources

In order to import other data sources they must follow this structure:

{
    id: string // unique identifier for each sample, should match file name in `./wavs/[id].wav` folder
    content: string // the transcript
    speaker_id?: string // optional when building for multi-speaker, unique on a per voice speaker basis
}[]

Development

git clone https://github.com/devidw/dswav
cd dswav

poetry install

make dev

notes