Bud500 is a diverse Vietnamese speech corpus designed to support ASR research community. With aprroximately 500 hours of audio, it covers a broad spectrum of topics including podcast, travel, book, food, and so on, while spanning accents from Vietnam's North, South, and Central regions. Derived from free public audio resources, this publicly accessible dataset is designed to significantly enhance the work of developers and researchers in the field of speech recognition.
Dataloader name:
bud500/bud500.py
DataCatalogue: http://seacrowd.github.io/seacrowd-catalogue/card.html?bud500