Which Vietnamese Accents are included in the dataset?

VinAIResearch / PhoWhisper

PhoWhisper: Automatic Speech Recognition for Vietnamese (2024)

BSD 3-Clause "New" or "Revised" License

112 stars 10 forks source link

Which Vietnamese Accents are included in the dataset? #6

Closed chungvle closed 7 months ago

chungvle commented 8 months ago

Thank you for releasing PhoWhisper for Vietnamese! Your paper describes a large-scale ASR dataset that includes a diverse array of Vietnamese accents from different regions in Vietnam. Could you kindly elaborate on that, e.g.:

which regions are included, e.g. Hue, Saigon, etc?
is the dataset open source? or is it proprietary?
is it possible to collaborate, for example adding a missing dialect, or enhancing the dataset? Thank you, kindly.

datquocnguyen commented 7 months ago

We do not plan to open-source the dataset atm. And due to our GenAI policy, we cannot share much information about this dataset.