dmort27 / epitran

A tool for transcribing orthographic text as IPA (International Phonetic Alphabet)
MIT License
649 stars 123 forks source link

attempt to transliterate english not working, despite flite being installed properly #35

Closed brisvag closed 5 years ago

brisvag commented 5 years ago

I've been trying to use the english transliteration, without success.

I did follow installation instructions for flite (and also copied the relevant binaries in the /usr/local/bin), and the process seems to have worked since I do not get anymore the "lex_lookup not installed" kind of error.

However, I'm still stuck at a rather cryptic KeyError. When I do (in python3):

import epitran
epi = epitran.Epitran('eng-Latn')
epi.transliterate('Berkeley')

this is what I get:

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-8-e30894fd177f> in <module>
----> 1 epi.transliterate('Berkeley')

~/.local/lib/python3.7/site-packages/epitran/_epitran.py in transliterate(self, word, normpunc, ligatures)
     60             unicode: IPA string
     61         """
---> 62         return self.epi.transliterate(word, normpunc, ligatures)
     63 
     64     def reverse_transliterate(self, ipa):

~/.local/lib/python3.7/site-packages/epitran/flite.py in transliterate(self, text, normpunc, ligatures)
     89         for chunk in self.chunk_re.findall(text):
     90             if self.letter_re.match(chunk):
---> 91                 acc.append(self.english_g2p(chunk))
     92             else:
     93                 acc.append(chunk)

~/.local/lib/python3.7/site-packages/epitran/flite.py in english_g2p(self, text)
    205             logging.warning('Non-zero exit status from lex_lookup.')
    206             arpa_text = ''
--> 207         return self.arpa_to_ipa(arpa_text)

~/.local/lib/python3.7/site-packages/epitran/flite.py in arpa_to_ipa(self, arpa_text, ligatures)
     73         arpa_list = map(lambda d: re.sub('\d', '', d), arpa_list)
     74         ipa_list = map(lambda d: self.arpa_map[d], arpa_list)
---> 75         text = ''.join(ipa_list)
     76         return text
     77 

~/.local/lib/python3.7/site-packages/epitran/flite.py in <lambda>(d)
     72         arpa_list = self.arpa_text_to_list(arpa_text)
     73         arpa_list = map(lambda d: re.sub('\d', '', d), arpa_list)
---> 74         ipa_list = map(lambda d: self.arpa_map[d], arpa_list)
     75         text = ''.join(ipa_list)
     76         return text

KeyError: 'iy)\n(b'

No matter my query, it seems self.arpa_map does not have it. What am I doing wrong?

dmort27 commented 5 years ago

This is perplexing. For some reason, lex_lookup is producing output with newlines. Can you try entering the following at the command line and reporting the output?

lex_lookup 'berkeley'

I want to rule out the possibility that this has something to do with the lex_lookup binary on your system. It should output (b er1 k l iy0)

brisvag commented 5 years ago

Interesting. I get a repetition:

$ lex_lookup 'berkeley'
(b er1 k l iy0)
(b er1 k l iy0)

I suppose this means somehow my lex_lookup binary is misbehaving.

dmort27 commented 5 years ago

Indeed, your lex_lookup is not behaving as expected. What platform and what OS version are you on?

There is an easy, if kludgy, fix for this (within Epitran): split the output of lex_lookup on '\n' and take the first element in the resulting list. I'll implement this and make a new release today.

dmort27 commented 5 years ago

Try the new release (0.60) and see if it fixes the problem.

brisvag commented 5 years ago

Indeed, your lex_lookup is not behaving as expected. What platform and what OS version are you on?

I'm on ArchLinux, and I built flite from the source on github. This might be some problem/change with the latest upstream... I really doubt I did anything strange, since the installation process is quite easy and streamlined.

Anyways, your quick-and-dirty fix works just fine, thank you!

dmort27 commented 5 years ago

You built flite from the GitHub source, then? This probably explains the problem. I think Alan Black, author of flite, may have changed the behavior of lex_lookup since the non-release release that is recommended in the README for Epitran. I'll talk to @awbcmu and @saikrishnarallabandi to see if there are any other interface changes to the lex_lookup interface that I should accomodate. Then perhaps I can start directing people to the GitHub source rather than the package linked from the README.

dmort27 commented 5 years ago

I just talked to @awbcmu and apparently there was a change in behavior in lex_lookup. He said epitran should take the first transcription, so the current behavior is right.

brisvag commented 5 years ago

Good, glad this helped to solve something! I'd say this can be closed for good now.

WhiteFu commented 5 years ago

@brisvag Have you solved this problem? Is there a viable solution?

brisvag commented 5 years ago

Yes. As I said in my last message, dmort's upstream fix was enough.

WhiteFu commented 5 years ago

Thank you very much:)

Subbu-ssr commented 3 months ago

from langdetect import detect import epitran

Function to detect language and generate phonetics

def get_phonetics(text): try:

Detect the language

    language = detect(text)

    # Language code mapping for epitran
    language_map = {
        'en': 'eng-Latn',    # English
        'hi': 'hin-Deva',    # Hindi
        # Add more mappings as needed
    }

    # Get the corresponding epitran code
    epi_code = language_map.get(language)

    if not epi_code:
        return f"Language '{language}' not supported for phonetic transcription."

    # Initialize Epitran
    epi = epitran.Epitran(epi_code)

    # Generate phonetics
    phonetics = epi.transliterate(text)

    return phonetics
except Exception as e:
    return f"An error occurred: {e}"

Test the function

text_hindi = "नमस्ते, आप कैसे हैं?" text_english = "Hello, how are you?"

print(get_phonetics(text_hindi)) print(get_phonetics(text_english))

i am getting this error while running this code:

nəmste, aːpə kæːse ɦæːn? WARNING:epitran:lex_lookup (from flite) is not installed. An error occurred: list index out of range

can any one help me to solve this issue