NVIDIA / tacotron2

Tacotron 2 - PyTorch implementation with faster-than-realtime inference
BSD 3-Clause "New" or "Revised" License
5.12k stars 1.39k forks source link

symbols.py for Arabic letters #603

Open MohamedAlmahmood opened 1 year ago

MohamedAlmahmood commented 1 year ago

I developed my own dataset ~9.5 hours for the Arabic Bahraini dialect. My validation loss is around 1.5 . I think this is partly due to how I defined the Arabic symbols. Is my implementation correct? Could someone please help?

pad = '' _punctuation = '.!,؟*: ' _special = '-'

Phonemes

_vowels = 'واي' _non_pulmonic_consonants = '' _pulmonic_consonants = 'لإإلأابتثجحخدذرزسشصضطظعغفقكلمنهويءؤآ' _suprasegmentals = 'ˈˌːˑ' _other_symbols = '' _diacrilics = 'ّ' _extra_phons = [] # some extra symbols that I found in from wiktionary ipa annotations

_extra_phons = ['g', 'ɝ', '̃', '̍', '̥', '̩', '̯', '͡'] # some extra symbols that I found in from wiktionary ipa annotations

phonemes = list( _pad + _punctuation + _special + _vowels + _non_pulmonic_consonants

phonemes_set = set(phonemes) silent_phonemes_indices = [i for i, p in enumerate(phonemes) if p in _pad + _punctuation]