Closed petasis closed 5 years ago
Hi! Thanks for pointing this out.
First of all, an error in the PoS tagger propagates to a lemmatization error, because lemmatization uses the PoS tag in order to apply rules and find the correct lemma for each word. Therefore, I suspect that the main issue here is the incorrect PoS tags.
For the PoS tagger errors, the situation is as follows: the PoS tagger gets 95% accuracy on the dev set of the treebank it is trained on, which is the Universal Dependencies conversion of the Greek Dependency Treebank (v2.2). However, if you inspect a bit this treebank, you will notice that the language used there is quite different; questions are quite rare, there is no discussion but more or less declarative sentences that state facts about the word or support opinions.
One interesting thing to notice is that if you convert your questions to declarative sentences, spaCy is producing correct results. For example: "Εγώ έχω αδέρφια" gives: ['PRON', 'VERB', 'NOUN', 'PUNCT'] "Συμφωνώ με τον όρο Βόρεια Μακεδονία για τους βόρειους γείτονες μας" gives: ['VERB', 'ADP', 'DET', 'NOUN', 'ADJ', 'PROPN', 'ADP', 'DET', 'ADJ', 'NOUN', 'PRON'] You could reproduce this behavior for all your examples.
So, in general, I would say that this is obviously bad, but we do have to remember two things: (i) you are asking the model to predict on different types of sentences than the ones it was trained on, and (ii) you can always finetune your model with some extra annotation on your data, to get the desired behavior.
If you want to use a function from some other library, you could always create a new component and add this to your nlp pipeline, as described here.
Merging this with #3052 🙂
Hi again,
I have installed spacy on a new machine (running the same os as the previous one), and I am getting different lemmas on the two machines. How can I debug this?
pip3 freeze|grep spacy spacy==2.1.8 python3 -m spacy download el --user Requirement already satisfied: el_core_news_sm==2.1.0 from https://github.com/explosion/spacy-models/releases/download/el_core_news_sm-2.1.0/el_core_news_sm-2.1.0.tar.gz#egg=el_core_news_sm==2.1.0 in ./.local/lib/python3.7/site-packages (2.1.0) ? Download and installation successful You can now load the model via spacy.load('el_core_news_sm') ? Linking successful /home/petasis/.local/lib/python3.7/site-packages/el_core_news_sm --> /home/petasis/.local/lib/python3.7/site-packages/spacy/data/el You can now load the model via spacy.load('el')
pip3 freeze|grep spacy spacy==2.1.8 python3 -m spacy download el --user Requirement already satisfied: el_core_news_sm==2.1.0 from https://github.com/explosion/spacy-models/releases/download/el_core_news_sm-2.1.0/el_core_news_sm-2.1.0.tar.gz#egg=el_core_news_sm==2.1.0 in /home/pepper/.local/lib/python3.7/site-packages (2.1.0) \u2714 Download and installation successful You can now load the model via spacy.load('el_core_news_sm') \u2714 Linking successful /home/pepper/.local/lib/python3.7/site-packages/el_core_news_sm --> /home/pepper/.local/lib/python3.7/site-packages/spacy/data/el You can now load the model via spacy.load('el')
But machine A for example returns "γνώρισο" for "γνώρισα", while machine B returns "γνώρισας". Most of the lemmas are the same, but there are some cases that different runs produce different results. How can I debug this?
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.
Hi all, I am facing problems with the model for the Greek language. Mainly for part of speech tags (failure on verbs is quite high) and lemmas. For example:
Do you know if a better model for Greek will be released soon? In the meantime, is it possible to replace the part-of-speech tagger and lemmatiser of the 'el' model with others that I have access to?