Open rhdunn opened 1 year ago
The lemma of less is currently not little in any corpus I know. It's conceivable I suppose, but if we change this corpus it would be out of sync with all the others, so I wouldn't unless there's a big push to do it.
For further I recall some principled decision to only lemmatize farther/further if it's in a context that could take "far", whereas the discourse adverb is lemmatized "further". I could be wrong, but it looks like EWT is the same.
The other errors are fixed, except for the UK thing - I'm not sure whether we want to lemmatize UK+US spelllings together or not. UK spelling is also accepted without <sic ana>
in GUM as a policy. @nschneid ?
I don't recall a practice of normalizing UK vs. US spellings in the lemma, and EWT has both "realize" and "realise" for example. If it's important to group them together I'd probably do it via a new MISC feature rather than mess with lemmas (since lemmatizers are probably not trained to normalize by default?).
less -> little
other irregular
pronouns
mismatched for the part of speech
UK vs US
These are lemmatizing to the US spelling.