Pre-processing improvements

At the moment the feature extractor is essentially just an NGramsOfWords-like function but it previously outperformed the extract() function from ./extract (which was using a lemmatizer); that being said, I think the feature extractor could be improved to including a stemming/lemmatization step (as well as a normalisation step like limdu.features.LowerCaseNormalizer)

Another thing to consider would be to get rid of useless instances categorised as null.