Open markuskiller opened 10 years ago
I don't know, if this is going to help, but I had some time on my hands and it is what I found out so far.
de-lexicon.txt
Added: Du PPER (Like: du PPER) Fixes "Du" to be tagged as personal pronoun at the start of a sentence, but sometimes it is still tagged as JJ somewhere else in a sentence (by a context-rule?).
Changed: Ihr PPOSAT -> Ihr PPER PPOSAT (Like: ihr PPER PPOSAT) Fixes "Ihr" to be tagged as personal pronoun (second person, plural) but breaks its use as possessive pronoun, because the class Lexicon of the pattern module only reads the first two words of each line, so that there is no advantage in writing more tags after a word.
def load(self):
dict.update(self, (x.split(" ")[:2] for x in _read(self._path)))
Ihr PPOSAT -> ["Ihr", "PPOSAT"] Ihr PPER PPOSAT -> ["Ihr", "PPER"]
Also the POS-tagging of German verbs doesn't work well. I tried putting some of the verbs into the de-lexicon.txt
paired with their correct tags and it works most of the time (for example bist VAFIN
). So adding all of them to the lexicon would work but seems redundant, because all known verb forms are already there, in form of the de-verbs.txt
and the inflect.py
class knows about all their conjugated forms and would be able to POS-tag them right with the help of a small new function.
Maybe it would help to add a function to search through the verb forms and chose the right tag and call it right after searching for tags in the lexicon in find_tags()
.
@CJAnti reported a
pattern
related issue totextblob-de
which uses a Python3 compatible version ofpattern.de
or the originalpattern
distribution (if installed) on Python2.https://github.com/markuskiller/textblob-de/issues/9
EXPECTED: Du bist --> Lemma: sein
Ihr seid --> Lemma: sein