I developed my own dataset ~9.5 hours for the Arabic Bahraini dialect.
My validation loss is around 1.5 .
I think this is partly due to how I defined the Arabic symbols.
Is my implementation correct?
Could someone please help?
pad = ''
_punctuation = '.!,؟*: '
_special = '-'
Phonemes
_vowels = 'واي'
_non_pulmonic_consonants = ''
_pulmonic_consonants = 'لإإلأابتثجحخدذرزسشصضطظعغفقكلمنهويءؤآ'
_suprasegmentals = 'ˈˌːˑ'
_other_symbols = ''
_diacrilics = 'ّ'
_extra_phons = [] # some extra symbols that I found in from wiktionary ipa annotations
_extra_phons = ['g', 'ɝ', '̃', '̍', '̥', '̩', '̯', '͡'] # some extra symbols that I found in from wiktionary ipa annotations
I developed my own dataset ~9.5 hours for the Arabic Bahraini dialect. My validation loss is around 1.5 . I think this is partly due to how I defined the Arabic symbols. Is my implementation correct? Could someone please help?
pad = '' _punctuation = '.!,؟*: ' _special = '-'
Phonemes
_vowels = 'واي' _non_pulmonic_consonants = '' _pulmonic_consonants = 'لإإلأابتثجحخدذرزسشصضطظعغفقكلمنهويءؤآ' _suprasegmentals = 'ˈˌːˑ' _other_symbols = '' _diacrilics = 'ّ' _extra_phons = [] # some extra symbols that I found in from wiktionary ipa annotations
_extra_phons = ['g', 'ɝ', '̃', '̍', '̥', '̩', '̯', '͡'] # some extra symbols that I found in from wiktionary ipa annotations
phonemes = list( _pad + _punctuation + _special + _vowels + _non_pulmonic_consonants
phonemes_set = set(phonemes) silent_phonemes_indices = [i for i, p in enumerate(phonemes) if p in _pad + _punctuation]