Closed AngledLuffa closed 10 months ago
Hehe, that's detailed knowledge there! But looking at the specific document, this can't be CoreNLP, which was indeed used to generate the base lemmatization before manual correction up through maybe GUM v3 or so. This is a textbook document, and the dateCollected shows it was only added in 2021, so this was almost certainly lemmatized by Stanza itself (could be that same self-referential error...)
Most of these kinds of errors get weeded out during annotation, or they're caught later when we compare multiple tagger disagreements and adjudicate, but this was missed. Will fix!
There's a sentence with a lemma of
swe
forswing
Makes me wonder if this originally used CoreNLP, since there was a bug where it was lemmatizing
swing
intoswe
. Now this is causing Stanza to do the same thing. A self-referential data error...