At the moment the feature extractor is essentially just an NGramsOfWords-like function but it previously outperformed the extract() function from ./extract (which was using a lemmatizer); that being said, I think the feature extractor could be improved to including a stemming/lemmatization step (as well as a normalisation step like limdu.features.LowerCaseNormalizer)
Another thing to consider would be to get rid of useless instances categorised as null.
At the moment the feature extractor is essentially just an
NGramsOfWords
-like function but it previously outperformed theextract()
function from./extract
(which was using a lemmatizer); that being said, I think the feature extractor could be improved to including a stemming/lemmatization step (as well as a normalisation step likelimdu.features.LowerCaseNormalizer
)Another thing to consider would be to get rid of useless instances categorised as
null
.