Open alihashaam opened 5 months ago
@alihashaam Thank you for your idea.
Have you taken a look at https://github.com/Liebeck/spacy-iwnlp/blob/master/spacy_iwnlp/__init__.py#L13 ?
You should be able to set use_plain_lemmatization=True
when you create an instance of spacy-iwnlp
https://github.com/Liebeck/spacy-iwnlp/blob/master/develop.py#L5
@Liebeck Thank you for your answer.
I tried that but I was not able to provide use_plain_lemmatization as config parameter as I kept getting wrong config error, I will try again if that works.
Right now, I just made it work by overriding the create_component function:
@Language.factory("iwnlp-test2")
def create_component(nlp: Language, name, lemmatizer_path, use_plain_lemmatization=True, ignore_case=True):
return spaCyIWNLP(
lemmatizer_path=lemmatizer_path,
use_plain_lemmatization=use_plain_lemmatization,
ignore_case=ignore_case
)
While defining create_component, only lemmatizer_path is getting passed and there is not an option to utilise use_plain_lemmatization from spaCyIWNLP, but in spaCyIWNLP's constructor we can pass use_plain_lemmatization and ignore_case (see init.py file)
Why I am asking is because let's say I have German sentence: "Es geht um den Anschluss von Waschmaschine, Spülmaschine und Spülbecken"
Now when I process this with spacy-iwnlp to get lemmas (word._.iwnlp_lemmas), I am not getting lemmas for Waschmaschine and Spülmaschine when actually there lemmas are in the json file (IWNLP.Lemmatizer_20181001.json) provided. So, after further look I realised that since in this sentence Spacy is putting Waschmaschine as PROPN while in the provided json (IWNLP.Lemmatizer_20181001.json), the form available is Noun.
So that is why I am looking to do lazy lemmatization where I want to get all lemmas of word without looking at POS. So for that purpose use_plain_lemmatization can be super handy