Open AhmedElsagher opened 4 years ago
The closest I've come to learning this so far is this paper
The transcripts are available in the original orthographic script, but were additionally mapped into a romanized form. For Arabic only the romanized version is available; Tamil is not processed yet. For romanization, several tools were developed which vary from simple context-free mapping tools to more elaborated algorithms, like for the segmentation and pinyinzation of Chinese Hanzi. The romanized version of all transcripts is coded in ASCII-7.
Is there any updates regarding the romanized system used in MFA? Nothing is available online, any advice would be greatly helpful.
Hi, I was interested in trying the Pre-trained G2P Arabic model but it only accepted the encoded roman characters for Arabic and the encoding is done by GlobalPhone as you mentioned in the documentation i tried to search for GlobalPhone documentation for Arabic several hours but i found i need to buy to the whole dataset so is there a link for the encoding/romanization or something like that. I only asked because there is several romaniazation standards for Arabic characters so which one i should use? can anyone help with that?