blagae / whitakers_words

Other
23 stars 5 forks source link

More broken words #1

Closed CodyTeague closed 3 years ago

CodyTeague commented 4 years ago

Steps to reproduce:

from whitakers_words.parse import Parser
parser = Parser()
parser.parse("deserto")

Output:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.7/site-packages/open_words/parse.py", line 71, in parse
    parse_result = self.analyze_forms(word)
  File "/usr/local/lib/python3.7/site-packages/open_words/parse.py", line 94, in analyze_forms
    match_stems = self.match_stems_inflections(form, viable_inflections)
  File "/usr/local/lib/python3.7/site-packages/open_words/parse.py", line 127, in match_stems_inflections
    if self.check_match(stem_cand, infl_cand):
  File "/usr/local/lib/python3.7/site-packages/open_words/parse.py", line 154, in check_match
    return stem['orth'] == wrd['parts'][-1]

Full list of words that were not able to get valid output from program

deserto
peccatorum
peccata
dicens
factum
de
descendentem
facta
desertum
credite
secus
piscatores
faciam
secuti
componentes
mirati
dicentes
nova
continuo
facto
desertis
CodyTeague commented 4 years ago

Looks like they all have the same or similar issue.

Unable to parse deserto
Traceback (most recent call last):
  File "get_word_data.py", line 17, in get_word
    res = word_parser.parse(word)
  File "/usr/local/lib/python3.7/site-packages/open_words/parse.py", line 71, in parse
    parse_result = self.analyze_forms(word)
  File "/usr/local/lib/python3.7/site-packages/open_words/parse.py", line 94, in analyze_forms
    match_stems = self.match_stems_inflections(form, viable_inflections)
  File "/usr/local/lib/python3.7/site-packages/open_words/parse.py", line 127, in match_stems_inflections
    if self.check_match(stem_cand, infl_cand):
  File "/usr/local/lib/python3.7/site-packages/open_words/parse.py", line 154, in check_match
    return stem['orth'] == wrd['parts'][-1]
KeyError: 'parts'
Unable to parse peccatorum
Traceback (most recent call last):
  File "get_word_data.py", line 17, in get_word
    res = word_parser.parse(word)
  File "/usr/local/lib/python3.7/site-packages/open_words/parse.py", line 71, in parse
    parse_result = self.analyze_forms(word)
  File "/usr/local/lib/python3.7/site-packages/open_words/parse.py", line 94, in analyze_forms
    match_stems = self.match_stems_inflections(form, viable_inflections)
  File "/usr/local/lib/python3.7/site-packages/open_words/parse.py", line 127, in match_stems_inflections
    if self.check_match(stem_cand, infl_cand):
  File "/usr/local/lib/python3.7/site-packages/open_words/parse.py", line 154, in check_match
    return stem['orth'] == wrd['parts'][-1]
KeyError: 'parts'
Unable to parse peccata
Traceback (most recent call last):
  File "get_word_data.py", line 17, in get_word
    res = word_parser.parse(word)
  File "/usr/local/lib/python3.7/site-packages/open_words/parse.py", line 71, in parse
    parse_result = self.analyze_forms(word)
  File "/usr/local/lib/python3.7/site-packages/open_words/parse.py", line 94, in analyze_forms
    match_stems = self.match_stems_inflections(form, viable_inflections)
  File "/usr/local/lib/python3.7/site-packages/open_words/parse.py", line 127, in match_stems_inflections
    if self.check_match(stem_cand, infl_cand):
  File "/usr/local/lib/python3.7/site-packages/open_words/parse.py", line 154, in check_match
    return stem['orth'] == wrd['parts'][-1]
KeyError: 'parts'
Unable to parse dicens
Traceback (most recent call last):
  File "get_word_data.py", line 17, in get_word
    res = word_parser.parse(word)
  File "/usr/local/lib/python3.7/site-packages/open_words/parse.py", line 71, in parse
    parse_result = self.analyze_forms(word)
  File "/usr/local/lib/python3.7/site-packages/open_words/parse.py", line 94, in analyze_forms
    match_stems = self.match_stems_inflections(form, viable_inflections)
  File "/usr/local/lib/python3.7/site-packages/open_words/parse.py", line 127, in match_stems_inflections
    if self.check_match(stem_cand, infl_cand):
  File "/usr/local/lib/python3.7/site-packages/open_words/parse.py", line 156, in check_match
    return stem['orth'] == wrd['parts'][0]
KeyError: 'parts'
Unable to parse factum
Traceback (most recent call last):
  File "get_word_data.py", line 17, in get_word
    res = word_parser.parse(word)
  File "/usr/local/lib/python3.7/site-packages/open_words/parse.py", line 71, in parse
    parse_result = self.analyze_forms(word)
  File "/usr/local/lib/python3.7/site-packages/open_words/parse.py", line 94, in analyze_forms
    match_stems = self.match_stems_inflections(form, viable_inflections)
  File "/usr/local/lib/python3.7/site-packages/open_words/parse.py", line 127, in match_stems_inflections
    if self.check_match(stem_cand, infl_cand):
  File "/usr/local/lib/python3.7/site-packages/open_words/parse.py", line 154, in check_match
    return stem['orth'] == wrd['parts'][-1]
KeyError: 'parts'
Unable to parse de
Traceback (most recent call last):
  File "get_word_data.py", line 17, in get_word
    res = word_parser.parse(word)
  File "/usr/local/lib/python3.7/site-packages/open_words/parse.py", line 71, in parse
    parse_result = self.analyze_forms(word)
  File "/usr/local/lib/python3.7/site-packages/open_words/parse.py", line 94, in analyze_forms
    match_stems = self.match_stems_inflections(form, viable_inflections)
  File "/usr/local/lib/python3.7/site-packages/open_words/parse.py", line 127, in match_stems_inflections
    if self.check_match(stem_cand, infl_cand):
  File "/usr/local/lib/python3.7/site-packages/open_words/parse.py", line 154, in check_match
    return stem['orth'] == wrd['parts'][-1]
KeyError: 'parts'
Unable to parse descendentem
Traceback (most recent call last):
  File "get_word_data.py", line 17, in get_word
    res = word_parser.parse(word)
  File "/usr/local/lib/python3.7/site-packages/open_words/parse.py", line 71, in parse
    parse_result = self.analyze_forms(word)
  File "/usr/local/lib/python3.7/site-packages/open_words/parse.py", line 94, in analyze_forms
    match_stems = self.match_stems_inflections(form, viable_inflections)
  File "/usr/local/lib/python3.7/site-packages/open_words/parse.py", line 127, in match_stems_inflections
    if self.check_match(stem_cand, infl_cand):
  File "/usr/local/lib/python3.7/site-packages/open_words/parse.py", line 156, in check_match
    return stem['orth'] == wrd['parts'][0]
KeyError: 'parts'
Unable to parse facta
Traceback (most recent call last):
  File "get_word_data.py", line 17, in get_word
    res = word_parser.parse(word)
  File "/usr/local/lib/python3.7/site-packages/open_words/parse.py", line 71, in parse
    parse_result = self.analyze_forms(word)
  File "/usr/local/lib/python3.7/site-packages/open_words/parse.py", line 94, in analyze_forms
    match_stems = self.match_stems_inflections(form, viable_inflections)
  File "/usr/local/lib/python3.7/site-packages/open_words/parse.py", line 127, in match_stems_inflections
    if self.check_match(stem_cand, infl_cand):
  File "/usr/local/lib/python3.7/site-packages/open_words/parse.py", line 154, in check_match
    return stem['orth'] == wrd['parts'][-1]
KeyError: 'parts'
Unable to parse desertum
Traceback (most recent call last):
  File "get_word_data.py", line 17, in get_word
    res = word_parser.parse(word)
  File "/usr/local/lib/python3.7/site-packages/open_words/parse.py", line 71, in parse
    parse_result = self.analyze_forms(word)
  File "/usr/local/lib/python3.7/site-packages/open_words/parse.py", line 94, in analyze_forms
    match_stems = self.match_stems_inflections(form, viable_inflections)
  File "/usr/local/lib/python3.7/site-packages/open_words/parse.py", line 127, in match_stems_inflections
    if self.check_match(stem_cand, infl_cand):
  File "/usr/local/lib/python3.7/site-packages/open_words/parse.py", line 154, in check_match
    return stem['orth'] == wrd['parts'][-1]
KeyError: 'parts'
Unable to parse credite
Traceback (most recent call last):
  File "get_word_data.py", line 17, in get_word
    res = word_parser.parse(word)
  File "/usr/local/lib/python3.7/site-packages/open_words/parse.py", line 71, in parse
    parse_result = self.analyze_forms(word)
  File "/usr/local/lib/python3.7/site-packages/open_words/parse.py", line 94, in analyze_forms
    match_stems = self.match_stems_inflections(form, viable_inflections)
  File "/usr/local/lib/python3.7/site-packages/open_words/parse.py", line 127, in match_stems_inflections
    if self.check_match(stem_cand, infl_cand):
  File "/usr/local/lib/python3.7/site-packages/open_words/parse.py", line 154, in check_match
    return stem['orth'] == wrd['parts'][-1]
KeyError: 'parts'
Unable to parse secus
Traceback (most recent call last):
  File "get_word_data.py", line 17, in get_word
    res = word_parser.parse(word)
  File "/usr/local/lib/python3.7/site-packages/open_words/parse.py", line 71, in parse
    parse_result = self.analyze_forms(word)
  File "/usr/local/lib/python3.7/site-packages/open_words/parse.py", line 94, in analyze_forms
    match_stems = self.match_stems_inflections(form, viable_inflections)
  File "/usr/local/lib/python3.7/site-packages/open_words/parse.py", line 127, in match_stems_inflections
    if self.check_match(stem_cand, infl_cand):
  File "/usr/local/lib/python3.7/site-packages/open_words/parse.py", line 154, in check_match
    return stem['orth'] == wrd['parts'][-1]
KeyError: 'parts'
Unable to parse piscatores)
Traceback (most recent call last):
  File "get_word_data.py", line 17, in get_word
    res = word_parser.parse(word)
  File "/usr/local/lib/python3.7/site-packages/open_words/parse.py", line 56, in parse
    raise WordsException("Text to be parsed must be a single Latin word")
open_words.exceptions.WordsException: Text to be parsed must be a single Latin word
Unable to parse faciam
Traceback (most recent call last):
  File "get_word_data.py", line 17, in get_word
    res = word_parser.parse(word)
  File "/usr/local/lib/python3.7/site-packages/open_words/parse.py", line 71, in parse
    parse_result = self.analyze_forms(word)
  File "/usr/local/lib/python3.7/site-packages/open_words/parse.py", line 94, in analyze_forms
    match_stems = self.match_stems_inflections(form, viable_inflections)
  File "/usr/local/lib/python3.7/site-packages/open_words/parse.py", line 127, in match_stems_inflections
    if self.check_match(stem_cand, infl_cand):
  File "/usr/local/lib/python3.7/site-packages/open_words/parse.py", line 154, in check_match
    return stem['orth'] == wrd['parts'][-1]
KeyError: 'parts'
Unable to parse secuti
Traceback (most recent call last):
  File "get_word_data.py", line 17, in get_word
    res = word_parser.parse(word)
  File "/usr/local/lib/python3.7/site-packages/open_words/parse.py", line 71, in parse
    parse_result = self.analyze_forms(word)
  File "/usr/local/lib/python3.7/site-packages/open_words/parse.py", line 94, in analyze_forms
    match_stems = self.match_stems_inflections(form, viable_inflections)
  File "/usr/local/lib/python3.7/site-packages/open_words/parse.py", line 127, in match_stems_inflections
    if self.check_match(stem_cand, infl_cand):
  File "/usr/local/lib/python3.7/site-packages/open_words/parse.py", line 154, in check_match
    return stem['orth'] == wrd['parts'][-1]
KeyError: 'parts'
Unable to parse componentes
Traceback (most recent call last):
  File "get_word_data.py", line 17, in get_word
    res = word_parser.parse(word)
  File "/usr/local/lib/python3.7/site-packages/open_words/parse.py", line 71, in parse
    parse_result = self.analyze_forms(word)
  File "/usr/local/lib/python3.7/site-packages/open_words/parse.py", line 94, in analyze_forms
    match_stems = self.match_stems_inflections(form, viable_inflections)
  File "/usr/local/lib/python3.7/site-packages/open_words/parse.py", line 127, in match_stems_inflections
    if self.check_match(stem_cand, infl_cand):
  File "/usr/local/lib/python3.7/site-packages/open_words/parse.py", line 156, in check_match
    return stem['orth'] == wrd['parts'][0]
KeyError: 'parts'
Unable to parse mirati
Traceback (most recent call last):
  File "get_word_data.py", line 17, in get_word
    res = word_parser.parse(word)
  File "/usr/local/lib/python3.7/site-packages/open_words/parse.py", line 71, in parse
    parse_result = self.analyze_forms(word)
  File "/usr/local/lib/python3.7/site-packages/open_words/parse.py", line 94, in analyze_forms
    match_stems = self.match_stems_inflections(form, viable_inflections)
  File "/usr/local/lib/python3.7/site-packages/open_words/parse.py", line 127, in match_stems_inflections
    if self.check_match(stem_cand, infl_cand):
  File "/usr/local/lib/python3.7/site-packages/open_words/parse.py", line 154, in check_match
    return stem['orth'] == wrd['parts'][-1]
KeyError: 'parts'
Unable to parse dicentes
Traceback (most recent call last):
  File "get_word_data.py", line 17, in get_word
    res = word_parser.parse(word)
  File "/usr/local/lib/python3.7/site-packages/open_words/parse.py", line 71, in parse
    parse_result = self.analyze_forms(word)
  File "/usr/local/lib/python3.7/site-packages/open_words/parse.py", line 94, in analyze_forms
    match_stems = self.match_stems_inflections(form, viable_inflections)
  File "/usr/local/lib/python3.7/site-packages/open_words/parse.py", line 127, in match_stems_inflections
    if self.check_match(stem_cand, infl_cand):
  File "/usr/local/lib/python3.7/site-packages/open_words/parse.py", line 156, in check_match
    return stem['orth'] == wrd['parts'][0]
KeyError: 'parts'
Unable to parse nova
Traceback (most recent call last):
  File "get_word_data.py", line 17, in get_word
    res = word_parser.parse(word)
  File "/usr/local/lib/python3.7/site-packages/open_words/parse.py", line 71, in parse
    parse_result = self.analyze_forms(word)
  File "/usr/local/lib/python3.7/site-packages/open_words/parse.py", line 94, in analyze_forms
    match_stems = self.match_stems_inflections(form, viable_inflections)
  File "/usr/local/lib/python3.7/site-packages/open_words/parse.py", line 127, in match_stems_inflections
    if self.check_match(stem_cand, infl_cand):
  File "/usr/local/lib/python3.7/site-packages/open_words/parse.py", line 154, in check_match
    return stem['orth'] == wrd['parts'][-1]
KeyError: 'parts'
Unable to parse continuo
Traceback (most recent call last):
  File "get_word_data.py", line 17, in get_word
    res = word_parser.parse(word)
  File "/usr/local/lib/python3.7/site-packages/open_words/parse.py", line 71, in parse
    parse_result = self.analyze_forms(word)
  File "/usr/local/lib/python3.7/site-packages/open_words/parse.py", line 94, in analyze_forms
    match_stems = self.match_stems_inflections(form, viable_inflections)
  File "/usr/local/lib/python3.7/site-packages/open_words/parse.py", line 127, in match_stems_inflections
    if self.check_match(stem_cand, infl_cand):
  File "/usr/local/lib/python3.7/site-packages/open_words/parse.py", line 154, in check_match
    return stem['orth'] == wrd['parts'][-1]
KeyError: 'parts'
Unable to parse facto
Traceback (most recent call last):
  File "get_word_data.py", line 17, in get_word
    res = word_parser.parse(word)
  File "/usr/local/lib/python3.7/site-packages/open_words/parse.py", line 71, in parse
    parse_result = self.analyze_forms(word)
  File "/usr/local/lib/python3.7/site-packages/open_words/parse.py", line 94, in analyze_forms
    match_stems = self.match_stems_inflections(form, viable_inflections)
  File "/usr/local/lib/python3.7/site-packages/open_words/parse.py", line 127, in match_stems_inflections
    if self.check_match(stem_cand, infl_cand):
  File "/usr/local/lib/python3.7/site-packages/open_words/parse.py", line 154, in check_match
    return stem['orth'] == wrd['parts'][-1]
KeyError: 'parts'
Unable to parse desertis
Traceback (most recent call last):
  File "get_word_data.py", line 17, in get_word
    res = word_parser.parse(word)
  File "/usr/local/lib/python3.7/site-packages/open_words/parse.py", line 71, in parse
    parse_result = self.analyze_forms(word)
  File "/usr/local/lib/python3.7/site-packages/open_words/parse.py", line 94, in analyze_forms
    match_stems = self.match_stems_inflections(form, viable_inflections)
  File "/usr/local/lib/python3.7/site-packages/open_words/parse.py", line 127, in match_stems_inflections
    if self.check_match(stem_cand, infl_cand):
  File "/usr/local/lib/python3.7/site-packages/open_words/parse.py", line 154, in check_match
    return stem['orth'] == wrd['parts'][-1]
KeyError: 'parts'
blagae commented 4 years ago

Thanks for reporting this. I have found out that I changed the dictionary structure at some point and did not bother to check stems that are listed twice due to a large amount of translations (marked by a | character at the start of the second line):

pecc               pecc               peccav             peccat             V      1 1 INTRANS      X X X A O sin; do wrong, commit moral offense; blunder, stumble; be wrong;
pecc               pecc               peccav             peccat             V      1 1 INTRANS      X X X A O |make mistake; make slip in speaking; act incorrectly; go wrong, be faulty;

I have fixed it (will push later today) and added a few very basic tests to prevent regression.