ahmetaa / zemberek-nlp

NLP tools for Turkish.
Other
1.14k stars 207 forks source link

Cannot find reference item id #202

Closed ozturkberkay closed 5 years ago

ozturkberkay commented 5 years ago

I was trying to create a TurkishMorphology object using a custom dictionary. If I only add the root abbreviation to the lexicon, it works properly. However, if I add other pronunciations referring to the root, it gives me an error:

test-lexicon.txt:

VST [P:Noun, Abbrv; Pr:viesti]
VST [Pr:vesete; Ref:VST; Index:2]
NAMD [P:Noun, Abbrv; Pr:enemdi]
NAMD [Pr:neamede; Ref:NAMD; Index:2]
NAMD [Pr:namde; Ref:NAMD; Index:3]

standard_morphology.py:

lexicon = RootLexicon.builder().setLexicon(RootLexicon.DEFAULT).addTextDictionaries(Paths.get('../../../data/dictionaries/test-lexicon.txt')).build()
morphology = TurkishMorphology.create(lexicon)

Terminal:

Cannot find reference item id VST_Noun                                                              | TurkishDictionaryLoader$TextLexiconProcessor#getResult

By looking at the source code, it seems like the DictionaryItem with the corresponding reference id could not be found inside the RootLexicon instance:

          String referenceId = lateEntry.getMetaData(MetaDataId.REF_ID);
          if (!referenceId.contains("_")) {
            referenceId = referenceId + "_Noun";
          }
          DictionaryItem refItem = rootLexicon.getItemById(referenceId);
          if (refItem == null) {
            Log.warn("Cannot find reference item id " + referenceId);
          }

Any ideas?

ahmetaa commented 5 years ago

If you define it like this:

VST [P:Noun, Abbrv; Pr:viesti]
VST [P:Noun, Abbrv; Pr:vesete; Ref:VST_Noun_Abbrv; Index:2]

It would probably work. Unfortunately this mechanism is cumbersome.

ozturkberkay commented 5 years ago

Thanks! That fixed the problem.