boudinfl / pke

Python Keyphrase Extraction module
GNU General Public License v3.0
1.57k stars 291 forks source link

Add lemmatization option for normalizing loaded documents #189

Open yetra opened 2 years ago

yetra commented 2 years ago

According to #75, there used to be a lemmatization option for the load_document() method's normalization parameter.

This doesn't seem to be the case any longer - stemming is applied or word surface forms are used as stems - even though lemmas are extracted during text loading.

I'm (re)adding the lemmatization option as it would be very useful to have for e.g. TF-IDF.