Filtering on text length happens on metadata directly. Otherwise tokens and metadata don't match.
Removing punctuation from text is adapted so that text without space after punctuation (non-standard texts such as suggestions to the government) are properly handled.
TODO: use .iloc instead of .loc when selecting instances by index.