getalp / ALFFA_PUBLIC

MIT License
48 stars 98 forks source link

ALFFA_PUBLIC

(prepared by Laurent Besacier and Elodie Gauthier - Laurent.Besacier@imag.fr - Elodie.Gauthier@imag.fr)

This repository is a result of the ALFFA project http://alffa.imag.fr

We distribute READY-to-use (or READY-to-train) KALDI ASR systems and (when possible) associated corpora....

A summary of these resources and ASR performances, as well as a description of the ALFFA project has been published in the following paper:

Collecting Resources in Sub-Saharan African Languages for Automatic Speech Recognition: a Case Study of Wolof. Elodie Gauthier, Laurent Besacier, Sylvie Voisin, Michael Melese and Uriel Pascal Elingui. To appear at LREC 2016

So far, the ASR directory contains Kaldi recipes for 4 languages : Amharic, Swahili, Hausa and Wolof.

================AMHARIC=======================================

In ASR/AMHARIC/ you will find kaldi recipes + ressources - see README file for more details and ASR performance results that you should be able to reproduce - please cite this paper if you publish work using theses resources

@article{tachbelie2014, Author = {Martha Tachbelie and Solomon Teferra Abate and Laurent Besacier}, Date-Added = {2015-04-14 08:08:31 +0000}, Date-Modified = {2015-04-14 10:56:28 +0000}, Journal = {Speech Communication}, Publisher = {Elsevier}, Title = {Using different acoustic, lexical and language modeling units for ASR of an under-resourced language - Amharic}, Volume = {56}, Year = {2014}}

===============HAUSA==================================================

In ASR/HAUSA/ you will find kaldi recipes but YOU NEED TO BUY THE RESOURCES AT ELDA - see README file for more details and ASR performance results that you should be able to reproduce

===============SWAHILI====================================================

(this swahili ASR system is now available by default in the KALDI trunk when you install Kaldi on your machines)

In ASR/SWAHILI/ you will find kaldi recipes + ressources - see README file for more details and ASR performance results that you should be able to reproduce - please cite this paper if you publish w ork using theses resources

@InProceedings { gelas:hal-00954048, author = {Gelas, Hadrien and Besacier, Laurent and Pellegrino, Francois}, title = {{D}evelopments of {S}wahili resources for an automatic speech recognition system}, booktitle = {{SLTU} - {W}orkshop on {S}poken {L}anguage {T}echnologies for {U}nder-{R}esourced {L}anguages}, year = {2012}, address = {Cape-Town, Afrique Du Sud}, abstract = {no abstract}, x-international-audience = {yes}, url = {http://hal.inria.fr/hal-00954048}, }

===============WOLOF====================================================

In ASR/WOLOF/ you will find kaldi recipes + ressources - see README file for more details and ASR performance results that you should be able to reproduce - please cite this paper if you publish work using theses resources

@article{gauthier2016collect, Author = {Gauthier, Elodie and Besacier, Laurent and Voisin, Sylvie and Melese, Michael and Elingui, Uriel Pascal}, Journal = {LREC}, Title = {Collecting Resources in Sub-Saharan African Languages for Automatic Speech Recognition: a Case Study of Wolof}, Year = {2016} }

============================================================================

In the CORPUS.old directory, you can find the corpus collected for Swahili but this directory is obsolete since you can find everything in each ASR sub-directory