ahmetaa / zemberek-nlp

NLP tools for Turkish.
Other
1.14k stars 207 forks source link

"yemeden" word is analyzed as unknown #225

Closed adilyildiz closed 4 years ago

adilyildiz commented 5 years ago

"fırçalamadan" is analyzed as "fırçala:Verb|/madan:WithoutHavingDoneSo→Adv"

but not "yemeden" word. It has to be same.

ahmetaa commented 5 years ago

I cannot reproduce it in current trunk version. What is the version you are using?

yemeden
[yemek:Verb] ye:Verb|meden:WithoutHavingDoneSo→Adv
[yemek:Verb] ye:Verb|me:Inf2→Noun+A3sg+den:Abl
ahmetaa commented 5 years ago

Perhaps it is because what you see is the result of disambiguator, which probably makes a mistake there.

adilyildiz commented 5 years ago

"Dedem, yemek yemeden gitti." example sentence. I am using 0.17 in drive distro folder.

adilyildiz commented 5 years ago

But not only in sentence when i send "yemeden." as only input it gives UNK

ahmetaa commented 5 years ago

can you show a minimal example for reproducing the issue?

adilyildiz commented 5 years ago

`import java.io.IOException; import java.util.List;

import zemberek.core.logging.Log; import zemberek.morphology.analysis.SentenceAnalysis; import zemberek.morphology.analysis.SentenceWordAnalysis; import zemberek.morphology.analysis.WordAnalysis; import zemberek.morphology.lexicon.RootLexicon; import zemberek.morphology.TurkishMorphology; import zemberek.tokenization.TurkishSentenceExtractor; public class zp {

public static void kontrolet(String tur,String metin) throws IOException {
    TurkishMorphology morphology = TurkishMorphology.builder()        
            .setLexicon(RootLexicon.getDefault())
            .useInformalAnalysis()
            .build();
    TurkishSentenceExtractor extractor = TurkishSentenceExtractor.DEFAULT;
   String gelenkelime=""; 
    List<String> sentences = extractor.fromParagraph(metin);
            for (String sentence : sentences) {
                List<WordAnalysis> s=morphology.analyzeSentence(sentence);
                SentenceAnalysis result = morphology.disambiguate(sentence, s);

                for (SentenceWordAnalysis r:result){
                    gelenkelime=r.getWordAnalysis().getInput();
                    System.out.print(gelenkelime);
                    System.out.print(r.getBestAnalysis());
                    System.out.print("##"); 

                    System.out.println();

                }

            }
}
 public static void main(String[] args) throws IOException {
        // TODO code application logic here
     String input1,input2;
     Log.setError();
        try {
             input1 = args[0];
             input2=args[1];
             kontrolet(input1,input2);
            }
        catch (ArrayIndexOutOfBoundsException e){
            input2 = "Dedem, yemek yemeden gitti.";
             input1="tek";
             kontrolet(input1,input2);
        }
    }

} `

Dedem[dede:Noun] dede:Noun+A3sg+m:P1sg## ,[,:Punc] ,:Punc## yemek[yemek:Noun] yemek:Noun+A3sg## yemeden[UNK:Unk,Unk] yemeden:Unknown## gitti[gitmek:Verb] git:Verb+ti:Past+A3sg## .[.:Punc] .:Punc##

adilyildiz commented 5 years ago

0.16 => TurkishMorphology morphology = TurkishMorphology.createWithDefaults(); is working ok. 0.17 => TurkishMorphology morphology = TurkishMorphology.createWithDefaults(); not working.. 0.17 => TurkishMorphology morphology = TurkishMorphology.builder() .setLexicon(RootLexicon.getDefault()) .useInformalAnalysis() .build(); not working

ahmetaa commented 5 years ago

0.17.1 ile test edebilir misiniz?

adilyildiz commented 5 years ago

0.17.1 ile test edebilir misiniz?

hala düzgün parçalayamıyor birşey değişmedi. 0.16 ile çalışıyor hala.

ahmetaa commented 4 years ago

Your code generates this output:

Dedem[dede:Noun] dede:Noun+A3sg+m:P1sg##
,[,:Punc] ,:Punc##
yemek[yemek:Noun] yemek:Noun+A3sg##
yemeden[yemek:Verb] ye:Verb|meden:WithoutHavingDoneSo→Adv##
gitti[gitmek:Verb] git:Verb+ti:Past+A3sg##
.[.:Punc] .:Punc##

I cannot reproduce the problem, so closing the issue.