This repository contains some models pretrained on the Voxlingua107 dataset to be used for spoken (audio based) language classification. The dataset (and therefore the models) can distinguish between 107 different types of languages. Four models are provided ( See below ).
git clone https://github.com/RicherMans/SpokenLanguageClassifiers
pip install -r requirements.txt
python3 predict.py AUDIOFILE
The models (see below) can be also modified. Currently four models have been pretrained. All of which are accessed with the --model MODELNAME
parameter.
By default the models just print the top N
results (N=5 and can be changed with --N NUMBER
).
Four models were pretrained and can be chosen as the back-end:
Since I don't have access to other datasets for cross-dataset evaluation, I provide the current performance on my held-out cross-validation dataset:
Model | Precision | Recall | Accuracy |
---|---|---|---|
CNN6 | 81.7 | 84.4 | 83.6 |
CNN10 | 89.9 | 90.9 | 90.8 |
MobileNetV2 | 80.0 | 80.1 | 79.3 |
CNNVAD | 81.0 | 82.4 | 82.9 |