According to #75, there used to be a lemmatization option for the load_document() method's normalization parameter.
This doesn't seem to be the case any longer - stemming is applied or word surface forms are used as stems - even though lemmas are extracted during text loading.
I'm (re)adding the lemmatization option as it would be very useful to have for e.g. TF-IDF.
According to #75, there used to be a
lemmatization
option for theload_document()
method'snormalization
parameter.This doesn't seem to be the case any longer -
stemming
is applied or word surface forms are used as stems - even though lemmas are extracted during text loading.I'm (re)adding the
lemmatization
option as it would be very useful to have for e.g. TF-IDF.