ahmetaa / zemberek-nlp

NLP tools for Turkish.
Other
1.14k stars 207 forks source link

Fails after disambiguate #157

Closed teaddict closed 6 years ago

teaddict commented 6 years ago

Hi,

I am using zemberek 0.12.0 and I tried to get nouns like that:

val turkishMorphology = TurkishMorphology.createWithDefaults()
val sentence1 = "Yıldız Kızlar Dünya Şampiyonası FIVB'nin düzenlediği ve 18 yaşının altındaki voleybolcuların katılabildiği bir şampiyonadır."
val sentence2 = "Dünya Yıldız Kızlar Voleybol Şampiyonası'nda Yıldız Milli Takım, final maçında Çin'i 3-0 yenerek şampiyon oldu."

val analysis = turkishMorphology.analyzeSentence(sentence1)
val test = turkishMorphology.disambiguate(sentence1, analysis)
println(test.bestAnalysis().get(0).formatLong())

When I try with sentence2, it doesn't fail. But if I use sentence1, it throws exception:

java.lang.NullPointerException was thrown.
java.lang.NullPointerException
    at zemberek.morphology.ambiguity.PerceptronAmbiguityResolver$Decoder.bestPath(PerceptronAmbiguityResolver.java:298)
    at zemberek.morphology.ambiguity.PerceptronAmbiguityResolver.disambiguate(PerceptronAmbiguityResolver.java:84)
    at zemberek.morphology.TurkishMorphology.disambiguate(TurkishMorphology.java:233)

It fails while this step:

val test = turkishMorphology.disambiguate(sentence1, analysis)
ahmetaa commented 6 years ago

This seems to be fixed in trunk version. I added a test for it. We will release 0.13.0 in probably two weeks. No promises, but let me see if we can make a 0.12.1 release for this.

teaddict commented 6 years ago

Thank you very much. I can wait next release, I have only problem with this sentence. What might be the problem here: Yıldız Kızlar Dünya Şampiyonası FIVB'nin düzenlediği ve 18 yaşının altındaki voleybolcuların katılabildiği bir şampiyonadır. , because I tried to understand what might be the problem but couldn't get any idea.

ahmetaa commented 6 years ago

Oh I misread, that is the second sentence. But still, it works with latest code. Culprit is probably word "FIVB'nin" but cannot be sure.

teaddict commented 6 years ago

Okay, I just removed FIVB'nin and tried again, now it works. It seems FIVB'nin causes the problem.

ahmetaa commented 6 years ago

Ok, This problem exists in the current code as well. I will provide a fix.

ahmetaa commented 6 years ago

0.13.0 is released with a fix for this issue. Problem was deeper than I thought. Unlike previous versions, in 0.12.0, for unknown words, system does not offer any analysis. This was causing PerceptronAmbiguityResolver class fail during decoding phase. So I modified the decode function so that it generates and "Unknown" analysis for such words.

I am closing the issue. Please try and see if problem persists.

teaddict commented 6 years ago

okay thanks, my problem is solved with version 0.13.0, works well.