Transliteration of the sentences from devanagari script to ILSL12 convention

KunalDhawan / ASR-System-for-Hindi-Language

The repository contains all the codes necessary for my project - Automatic Speech Recognition System in Hindi Language ( Project description is available at :- https://kunal-dhawan.weebly.com/asr-system-for-hindi-language-from-scratch.html) : It contains the code for the following systems - 1) Monophone-HMM system built using HTK toolkit , 2)Monophone-HMM system built using Kaldi toolkit, 3)Triphone-HMM system built using Kaldi toolkit and 4)DNN-HMM system built using Kaldi toolkit

28 stars 17 forks source link

Transliteration of the sentences from devanagari script to ILSL12 convention #1

Open bharat-patidar opened 4 years ago

bharat-patidar commented 4 years ago

Hi Kunal, Your work is wonderful. I just wanted to know how can I get transliteration of sentences from devanagari script to ILSL12 conventions. It would be great if you can guide me.

Thanks, Bharat

kkokdari commented 4 years ago

@KunalDhawan Hi Kunal, I'm curious too! I found many standards to transfer devanagari script to Roman script, but not sure which standard you used.

bharat-patidar commented 4 years ago

Hi @kkokdari , You can use this parser. https://www.iitm.ac.in/donlab/tts/unified.php

kkokdari commented 4 years ago

@bharat-patidar appreciation! Thx for replying me!!! Do you have the devanagari script of the 150 sentences * 7 speakers mentioned in this repo? Or have you tested this unified parser tool can get the unified result which is the same as the input of this repo?

Thx Bharat Patidar!

bharat-patidar commented 4 years ago

Yes, I have used this parser and results are same. Kunal has also used the same parser.

kkokdari commented 4 years ago

@bharat-patidar I've read the paper of this parser, and have some corresponding questions for it. (sorry to bother you! I'm interested in Hindi speech synthesis and recognition, but after reading some papers I still cannot figure out answers of the following questions) 1、If we only build hindi speech recognition/synthesis system, why we still need to transfer original text to Roman script? 2、The input text of this repo is just like 'aadivaasii', transliterated by the Common Phone Set? But the alphabet 'v' doesn't exist in the Common Phone Set 1587384233652 3、in the lexicon.txt, for example, 'aashiirvaada aa sh ii r w aa d', the word 'aashiirvaada' is transliterated by the Common Phone Set without rules, while the phones 'aa sh ii r w aa d' is transliterated by the Common Phone Set with rules described in the paper A Common Attribute based Unified HTS framework for Speech Synthesis in Indian Languages.pdf ?

Appreciation for your reply!

kkokdari commented 4 years ago

Yes, I have used this parser and results are same. Kunal has also used the same parser.

Sorry to bother you again! @bharat-patidar I build the unified parser but failed cause of segmental fault when using it like ./unified-parser 'अंगार' 1 0 0 0.

Do you know how to use it successfully?

bharat-patidar commented 4 years ago

Yes, I have used this parser and results are same. Kunal has also used the same parser.

Sorry to bother you again! @bharat-patidar I build the unified parser but failed cause of segmental fault when using it like ./unified-parser 'अंगार' 1 0 0 0.

Do you know how to use it successfully?

Yes, I had faced similar error as well.

My issue was because of flex and bison dependencies. You have to install this libraries as mentioned in one of the file of parser.

Also, I didn't face this segmentation fault issue when I ran it on amazon ec2 instance.

Hope this helps!

bharat-patidar commented 4 years ago

@bharat-patidar I've read the paper of this parser, and have some corresponding questions for it. (sorry to bother you! I'm interested in Hindi speech synthesis and recognition, but after reading some papers I still cannot figure out answers of the following questions) 1、If we only build hindi speech recognition/synthesis system, why we still need to transfer original text to Roman script? 2、The input text of this repo is just like 'aadivaasii', transliterated by the Common Phone Set? But the alphabet 'v' doesn't exist in the Common Phone Set 3、in the lexicon.txt, for example, 'aashiirvaada aa sh ii r w aa d', the word 'aashiirvaada' is transliterated by the Common Phone Set without rules, while the phones 'aa sh ii r w aa d' is transliterated by the Common Phone Set with rules described in the paper A Common Attribute based Unified HTS framework for Speech Synthesis in Indian Languages.pdf ?

Appreciation for your reply!

Didn't understand your question. If you still need help on this, we can discuss this offline.

Thanks!

Chaitanya-Jadhav commented 4 years ago

Yes, I have used this parser and results are same. Kunal has also used the same parser.

Sorry to bother you again! @bharat-patidar I build the unified parser but failed cause of segmental fault when using it like ./unified-parser 'अंगार' 1 0 0 0.

Do you know how to use it successfully?

Hell, Have you solved this issue? I am facing the same issue! I have also raised the same on the IITM TTS group