PonteIneptique / collatinus-python

Collatinus Python Lemmatizer
GNU General Public License v3.0
8 stars 1 forks source link

Include punctuation in Lemmatiseur results? #19

Open diyclassics opened 6 years ago

diyclassics commented 6 years ago

The output of the lemmatise_multiple method in the Lemmatiseur class ignores punctuation. This makes direct comparison with other lemmatizers (e.g. TreeTagger) which retain punctuation.

PonteIneptique commented 6 years ago

Hey ! I would be fine with an option to retain punctuation. I have tried to stick as close as possible to the original code base, but I'd be fine with this change :) Are you willing to PR ?

diyclassics commented 6 years ago

Actually—I think sticking close to the original is a smart idea. I have been able to work around it—just thought it was worth mentioning as I compare a number of different lemmatizers.

If it is something you are definitely interested in, I could look at it soon, but not that soon. So, if anyone else is interested, they can take it.

PonteIneptique commented 6 years ago

I actually think it would be possible to bypass the limitation of not touching the code base by conditionally throwing tokens to lemmatise (which is the true original function, not multiple)