Open diyclassics opened 6 years ago
Hey ! I would be fine with an option to retain punctuation. I have tried to stick as close as possible to the original code base, but I'd be fine with this change :) Are you willing to PR ?
Actually—I think sticking close to the original is a smart idea. I have been able to work around it—just thought it was worth mentioning as I compare a number of different lemmatizers.
If it is something you are definitely interested in, I could look at it soon, but not that soon. So, if anyone else is interested, they can take it.
I actually think it would be possible to bypass the limitation of not touching the code base by conditionally throwing tokens to lemmatise (which is the true original function, not multiple)
The output of the
lemmatise_multiple
method in theLemmatiseur
class ignores punctuation. This makes direct comparison with other lemmatizers (e.g. TreeTagger) which retain punctuation.