Open ssaltin opened 7 years ago
As I realized now, their dictionary items are different:
yok [P:Adj; A:NoVoicing] yok [P:Adj; A:Voicing]
But still aren't they include same morphological result?
Thanks, I am aware of this problem and should be fixed in next version hopefully.
0.12 still creates double results. did not test with 0.13
Input: yoksa yoksa [yoksa:Conj] yoksa:Conj [yoksamak:Verb] yoksa:Verb+Imp+A2sg [yok:Adj] yok:Adj|Zero→Verb+sa:Cond+A3sg [yok:Adj] yok:Adj|Zero→Verb+sa:Cond+A3sg [Yok:Noun,Prop] yok:Noun+A3sg|Zero→Verb+sa:Cond+A3sg [yok:Noun] yok:Noun+A3sg|Zero→Verb+sa:Cond+A3sg Disambiguation result: [yoksa:Conj] yoksa:Conj
0.13.0 also produces double results for this. Because voicing attribute is optional for "yok" when constructing graph, two stem transitions are created for "yok". And for inputs like "yoktan" or "yok" paths passing from both stem transitions successfully terminates.
One possible solution for those words, reference attribute can be used. For example:
yok [P:Adj; A:Voicing] yok [P:Adj; A:NoVoicing, Ref:yok_Adj] ---> pointing first one
And after analysis, if morphemes are equal and both referenced item and item exists, one can be deleted. This can be done as a post processing operation.
For the input "yoksa" Zemberek generates 6 WordAnalysis, which contains 2 duplicate results, bold ones are duplicate
0 = {WordAnalysis@6912} "[(yoksa:yoksa) (Conj)]" 1 = {WordAnalysis@6913} "[(yok:yok) (Adj)(Verb;Cond:sa+A3sg)]" 2 = {WordAnalysis@6914} "[(yok:yok) (Adj)(Verb;Cond:sa+A3sg)]" 3 = {WordAnalysis@6915} "[(yok:yok) (Noun;A3sg+Pnon+Nom)(Verb;Cond:sa+A3sg)]" 4 = {WordAnalysis@6916} "[(yok:yok) (Adj)(Noun;A3sg+Pnon+Nom)(Verb;Cond:sa+A3sg)]" 5 = {WordAnalysis@6917} "[(yok:yok) (Adj)(Noun;A3sg+Pnon+Nom)(Verb;Cond:sa+A3sg)]"