coqui-ai / STT-models

Open models for Coqui STT
https://coqui.ai
117 stars 35 forks source link

Add model card, license, and alphabet for xty #14

Closed JEMeyer closed 2 years ago

JEMeyer commented 2 years ago

Overview

Added new model card/alphabet file/license for the Yoloxóchitl Mixtec language (xty).

Dataset was modified from https://www.openslr.org/89/ by removing chunks of records where sox would crash when trying to figure out the number of audio channels. The data was then process by commonvoice utils (https://github.com/ftyers/commonvoice-utils) to convert into the mono audio to train with Coqui.

Model adding separately: STT-SLR89-XTY-0.1

serapio commented 2 years ago

Did you find a way to filter out those chunks, or was it a matter of letting sox fail and then manually remove them? Do you have a script for (most of) this process?