ahmetaa / zemberek-nlp

NLP tools for Turkish.
Other
1.14k stars 207 forks source link

Wrong selection in bestAnalysis() for "anlatabilir misiniz?" #206

Closed yakupgezdirici closed 5 years ago

yakupgezdirici commented 5 years ago

Hi,

I'm using my .Net port of Zemberek. I assume the results are the same in original codes.

For "anlatabilir misiniz?" bestAnalysis() method returns

[anlamak:Verb] anla:Verb|t:Caus→Verb|abil:Able→Verb+ir:Aor+A3sg
[mis:Noun] mis:Noun+A3sg+in:Gen|Zero→Verb+Pres+iz:A1pl
[?:Punc] ?:Punc

But the best one seems to be

[anlatmak:Verb] anlat:Verb|abil:Able→Verb+ir:Aor+A3sg
[mi:Ques] mi:Ques+Pres+siniz:A2pl
[?:Punc] ?:Punc

TurkishMorphology.analyzeAndDisambiguate() returns

[anlatmak:Verb] anlat:Verb|abil:Able→Verb+ir:Aor+A3sg
[anlatmak:Verb] anlat:Verb|abil:Able→Verb|ir:AorPart→Adj
[anlamak:Verb] anla:Verb|t:Caus→Verb|abil:Able→Verb+ir:Aor+A3sg
[anlamak:Verb] anla:Verb|t:Caus→Verb|abil:Able→Verb|ir:AorPart→Adj

[mi:Ques] mi:Ques+Pres+siniz:A2pl
[mis:Noun] mis:Noun+A3sg+iniz:P2pl
[Mi:Noun,Abbrv] mi:Noun+A3sg|Zero→Verb+Pres+siniz:A2pl
[mi:Noun] mi:Noun+A3sg|Zero→Verb+Pres+siniz:A2pl
[mis:Noun] mis:Noun+A3sg+in:Gen|Zero→Verb+Pres+iz:A1pl
[mis:Noun] mis:Noun+A3sg+in:P2sg|Zero→Verb+Pres+iz:A1pl

[?:Punc] ?:Punc

Best regards,

Yakup

ahmetaa commented 5 years ago

I cannot reproduce it in the trunk version.

Sentence  = Anlatabilir misiniz?
Sentence word analysis result:
Word = Anlatabilir
[anlatmak:Verb] anlat:Verb|abil:Able→Verb+ir:Aor+A3sg
[anlatmak:Verb] anlat:Verb|abil:Able→Verb|ir:AorPart→Adj
[anlamak:Verb] anla:Verb|t:Caus→Verb|abil:Able→Verb+ir:Aor+A3sg
[anlamak:Verb] anla:Verb|t:Caus→Verb|abil:Able→Verb|ir:AorPart→Adj
Word = misiniz
[mi:Ques] mi:Ques+Pres+siniz:A2pl
[mis:Noun] mis:Noun+A3sg+iniz:P2pl
[Mi:Noun,Abbrv] mi:Noun+A3sg|Zero→Verb+Pres+siniz:A2pl
[mi:Noun] mi:Noun+A3sg|Zero→Verb+Pres+siniz:A2pl
[mis:Noun] mis:Noun+A3sg+in:Gen|Zero→Verb+Pres+iz:A1pl
[mis:Noun] mis:Noun+A3sg+in:P2sg|Zero→Verb+Pres+iz:A1pl
Word = ?
[?:Punc] ?:Punc

After ambiguity resolution : 
[anlamak:Verb] anla:Verb|t:Caus→Verb|abil:Able→Verb+ir:Aor+A3sg
[mi:Ques] mi:Ques+Pres+siniz:A2pl
[?:Punc] ?:Punc

Also how do you port the code? It is quite hard to port quickly changing code to another language. Perhaps you should consider using grpc or other server mechanisms.

yakupgezdirici commented 5 years ago

I haven't updated my codes since the end of 2018. I'm actually new to this NLP stuff. May be I should check the method calls and examples. If not successful I'll check it again after updating the codes.

I first ported the code last year in January. It took about a month or so, with some experience in .Net and nothing in Java. But I could not use it that time. I applied the changes you made the whole year, except a few features like ner, classification and apps. And also did not applied the coding styles of c# yet. That is no properties, camel casing in names etc. May be after version 1.

yakupgezdirici commented 5 years ago

OK. I found the problem. My sentence was not starting with capital letter "A" as you did. With that changed, the result is the same as yours.

mdakin commented 5 years ago

This sounds interesting, but doing this with grpc would be much easier probably. We sometimes shuffle a lot of code, very hard to keep in sync in a different language..

yakupgezdirici commented 5 years ago

Syncing changes made in a year, except those I left, took two weeks, which is acceptable for now.

mdakin commented 5 years ago

@yakupgezdirici Is your project open source?

yakupgezdirici commented 5 years ago

I haven't made a decision whether to make it open source or not, yet.

yakupgezdirici commented 5 years ago

A note on my resolution: It is definitely a misinterpretation of the case while testing. I should stop coding when having a flu. :)

The real cause was a different bug (or porting problem) in the .net codes.

ahmetaa commented 5 years ago

No problems. Disambiguation mechanism needs a lot of improvement anyway.