dmort27 / epitran

A tool for transcribing orthographic text as IPA (International Phonetic Alphabet)
MIT License
630 stars 121 forks source link

Errors when instantiating using certain codes #98

Closed cdleong closed 2 years ago

cdleong commented 2 years ago

Did a for loop over all the codes in https://github.com/dmort27/epitran/tree/master/epitran/data/map, i.e. aar-Latn ... zul-Latn.

I found that some of the codes result in errors when you attempt to instantiate with epi = epitran.Epitran(code)

The following mappings seem to have problems:

problem_mappings = ['generic-Latn',
 'tur-Latn-bab',
 'ood-Latn-sax',
 'vie-Latn-so',
 'vie-Latn-ce',
 'vie-Latn-no',
 'kaz-Cyrl-bab']

Test code:

from pathlib import Path
import epitran
import logging

# def get_valid_epitran_mappings_list():
#   map_path = Path(epitran.__path__[0]) / "data"/"map"
#   map_files = list(map_path.glob("*.*"))
#   valid_mappings = [map_file.stem for map_file in map_files]
#   return valid_mappings

def get_valid_epitran_mappings_list():

  map_path = Path(epitran.__path__[0]) / "data"/"map"

  map_files = map_path.glob("*.*")

  valid_mappings = [map_file.stem for map_file in map_files]

  return valid_mappings

valid_epitran_mappings = get_valid_epitran_mappings_list()  
print(valid_epitran_mappings)

logging.basicConfig(level=logging.DEBUG)
logger = logging.getLogger(__name__)

problem_mappings = []
for valid_epitran_mapping in valid_epitran_mappings:

  try: 
    epi = epitran.Epitran(valid_epitran_mapping)
    epi.transliterate("My Hovercraft is full of eels")
  except Exception as e: 
    # print(f"instantiating {valid_epitran_mapping}, encountered error")
    # print(f"Exception message: {e}")
    logger.error("**************")
    logger.error(f"Error encountered with {valid_epitran_mapping}")
    logger.exception(e)
    problem_mappings.append(problem_mappings)

print(problem_mappings)  
cdleong commented 2 years ago

Full output:

ERROR:__main__:**************
ERROR:__main__:Error encountered with generic-Latn
ERROR:__main__:Map file is not well formed at line 29.
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/epitran/simple.py", line 103, in _load_g2p_map
    graph, phon = fields
ValueError: not enough values to unpack (expected 2, got 1)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<ipython-input-32-bf398c1e3ca9>", line 9, in <module>
    epi = epitran.Epitran(valid_epitran_mapping)
  File "/usr/local/lib/python3.7/dist-packages/epitran/_epitran.py", line 46, in __init__
    self.epi = SimpleEpitran(code, preproc, postproc, ligatures, rev, rev_preproc, rev_postproc, tones=tones)
  File "/usr/local/lib/python3.7/dist-packages/epitran/simple.py", line 43, in __init__
    self.g2p = self._load_g2p_map(code, False)
  File "/usr/local/lib/python3.7/dist-packages/epitran/simple.py", line 105, in _load_g2p_map
    raise DatafileError('Map file is not well formed at line {}.'.format(i + 2))
epitran.exceptions.DatafileError: Map file is not well formed at line 29.
ERROR:__main__:**************
ERROR:__main__:Error encountered with tur-Latn-bab
ERROR:__main__:Line 4: "% Allophonic variants" cannot be parsed.
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/epitran/rules.py", line 67, in _read_rule
    a, b, X, Y = r.groups()
AttributeError: 'NoneType' object has no attribute 'groups'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<ipython-input-32-bf398c1e3ca9>", line 9, in <module>
    epi = epitran.Epitran(valid_epitran_mapping)
  File "/usr/local/lib/python3.7/dist-packages/epitran/_epitran.py", line 46, in __init__
    self.epi = SimpleEpitran(code, preproc, postproc, ligatures, rev, rev_preproc, rev_postproc, tones=tones)
  File "/usr/local/lib/python3.7/dist-packages/epitran/simple.py", line 49, in __init__
    self.postprocessor = PrePostProcessor(code, 'post', False)
  File "/usr/local/lib/python3.7/dist-packages/epitran/ppprocessor.py", line 29, in __init__
    self.rules = self._read_rules(code, fix, rev)
  File "/usr/local/lib/python3.7/dist-packages/epitran/ppprocessor.py", line 40, in _read_rules
    return Rules([abs_fn])
  File "/usr/local/lib/python3.7/dist-packages/epitran/rules.py", line 33, in __init__
    rules = self._read_rule_file(rule_file)
  File "/usr/local/lib/python3.7/dist-packages/epitran/rules.py", line 44, in _read_rule_file
    rules.append(self._read_rule(i, line))
  File "/usr/local/lib/python3.7/dist-packages/epitran/rules.py", line 69, in _read_rule
    raise DatafileError('Line {}: "{}" cannot be parsed.'.format(i + 1, line))
epitran.exceptions.DatafileError: Line 4: "% Allophonic variants" cannot be parsed.
ERROR:__main__:**************
ERROR:__main__:Error encountered with ood-Latn-sax
ERROR:__main__:Undefined symbol: ::vowels::
Traceback (most recent call last):
  File "<ipython-input-32-bf398c1e3ca9>", line 9, in <module>
    epi = epitran.Epitran(valid_epitran_mapping)
  File "/usr/local/lib/python3.7/dist-packages/epitran/_epitran.py", line 46, in __init__
    self.epi = SimpleEpitran(code, preproc, postproc, ligatures, rev, rev_preproc, rev_postproc, tones=tones)
  File "/usr/local/lib/python3.7/dist-packages/epitran/simple.py", line 48, in __init__
    self.preprocessor = PrePostProcessor(code, 'pre', False)
  File "/usr/local/lib/python3.7/dist-packages/epitran/ppprocessor.py", line 29, in __init__
    self.rules = self._read_rules(code, fix, rev)
  File "/usr/local/lib/python3.7/dist-packages/epitran/ppprocessor.py", line 40, in _read_rules
    return Rules([abs_fn])
  File "/usr/local/lib/python3.7/dist-packages/epitran/rules.py", line 33, in __init__
    rules = self._read_rule_file(rule_file)
  File "/usr/local/lib/python3.7/dist-packages/epitran/rules.py", line 44, in _read_rule_file
    rules.append(self._read_rule(i, line))
  File "/usr/local/lib/python3.7/dist-packages/epitran/rules.py", line 64, in _read_rule
    line = self._sub_symbols(line)
  File "/usr/local/lib/python3.7/dist-packages/epitran/rules.py", line 53, in _sub_symbols
    raise RuleFileError('Undefined symbol: {}'.format(s))
epitran.rules.RuleFileError: Undefined symbol: ::vowels::
ERROR:__main__:**************
ERROR:__main__:Error encountered with vie-Latn-so
ERROR:__main__:b'One-to-many G2P mapping for "uo\xcc\x9b" on lines 63, 255'
Traceback (most recent call last):
  File "<ipython-input-32-bf398c1e3ca9>", line 9, in <module>
    epi = epitran.Epitran(valid_epitran_mapping)
  File "/usr/local/lib/python3.7/dist-packages/epitran/_epitran.py", line 46, in __init__
    self.epi = SimpleEpitran(code, preproc, postproc, ligatures, rev, rev_preproc, rev_postproc, tones=tones)
  File "/usr/local/lib/python3.7/dist-packages/epitran/simple.py", line 43, in __init__
    self.g2p = self._load_g2p_map(code, False)
  File "/usr/local/lib/python3.7/dist-packages/epitran/simple.py", line 115, in _load_g2p_map
    raise MappingError('One-to-many G2P mapping for "{}" on lines {}'.format(graph, ', '.join(map(str, lines))).encode('utf-8'))
epitran.exceptions.MappingError: b'One-to-many G2P mapping for "uo\xcc\x9b" on lines 63, 255'
ERROR:__main__:**************
ERROR:__main__:Error encountered with vie-Latn-ce
ERROR:__main__:b'One-to-many G2P mapping for "gi" on lines 8, 29'
Traceback (most recent call last):
  File "<ipython-input-32-bf398c1e3ca9>", line 9, in <module>
    epi = epitran.Epitran(valid_epitran_mapping)
  File "/usr/local/lib/python3.7/dist-packages/epitran/_epitran.py", line 46, in __init__
    self.epi = SimpleEpitran(code, preproc, postproc, ligatures, rev, rev_preproc, rev_postproc, tones=tones)
  File "/usr/local/lib/python3.7/dist-packages/epitran/simple.py", line 43, in __init__
    self.g2p = self._load_g2p_map(code, False)
  File "/usr/local/lib/python3.7/dist-packages/epitran/simple.py", line 115, in _load_g2p_map
    raise MappingError('One-to-many G2P mapping for "{}" on lines {}'.format(graph, ', '.join(map(str, lines))).encode('utf-8'))
epitran.exceptions.MappingError: b'One-to-many G2P mapping for "gi" on lines 8, 29'
ERROR:__main__:**************
ERROR:__main__:Error encountered with vie-Latn-no
ERROR:__main__:b'One-to-many G2P mapping for "gi" on lines 8, 29'
Traceback (most recent call last):
  File "<ipython-input-32-bf398c1e3ca9>", line 9, in <module>
    epi = epitran.Epitran(valid_epitran_mapping)
  File "/usr/local/lib/python3.7/dist-packages/epitran/_epitran.py", line 46, in __init__
    self.epi = SimpleEpitran(code, preproc, postproc, ligatures, rev, rev_preproc, rev_postproc, tones=tones)
  File "/usr/local/lib/python3.7/dist-packages/epitran/simple.py", line 43, in __init__
    self.g2p = self._load_g2p_map(code, False)
  File "/usr/local/lib/python3.7/dist-packages/epitran/simple.py", line 115, in _load_g2p_map
    raise MappingError('One-to-many G2P mapping for "{}" on lines {}'.format(graph, ', '.join(map(str, lines))).encode('utf-8'))
epitran.exceptions.MappingError: b'One-to-many G2P mapping for "gi" on lines 8, 29'
ERROR:__main__:**************
ERROR:__main__:Error encountered with kaz-Cyrl-bab
ERROR:__main__:b'One-to-many G2P mapping for "\xd1\x83" on lines 7, 30, 34, 49'
Traceback (most recent call last):
  File "<ipython-input-32-bf398c1e3ca9>", line 9, in <module>
    epi = epitran.Epitran(valid_epitran_mapping)
  File "/usr/local/lib/python3.7/dist-packages/epitran/_epitran.py", line 46, in __init__
    self.epi = SimpleEpitran(code, preproc, postproc, ligatures, rev, rev_preproc, rev_postproc, tones=tones)
  File "/usr/local/lib/python3.7/dist-packages/epitran/simple.py", line 43, in __init__
    self.g2p = self._load_g2p_map(code, False)
  File "/usr/local/lib/python3.7/dist-packages/epitran/simple.py", line 115, in _load_g2p_map
    raise MappingError('One-to-many G2P mapping for "{}" on lines {}'.format(graph, ', '.join(map(str, lines))).encode('utf-8'))
epitran.exceptions.MappingError: b'One-to-many G2P mapping for "\xd1\x83" on lines 7, 30, 34, 49'
cdleong commented 2 years ago

Tested on Google Colab!

dmort27 commented 2 years ago

Thanks for doing this. I'll try to fix these in the next couple of days.

dmort27 commented 2 years ago

Well, it was more than a couple of days, but I finally fixed these. Thanks for your contribution!