Pipeline text - Githubissues

datasciencecampus / ace2

A text classification app

2 stars 3 forks source link

Pipeline text #10

Closed MartinWoodONS closed 4 years ago

MartinWoodONS commented 4 years ago

Minimal but functional text preprocessing pipeline
Runs as own script right now
Stemmers/lemmatizers are now transformer classes for ease of reuse
Lots of other refactoring/simplification
unused text cleaning functions removed for the moment

To do:

Proper docstrings
incorporate coding exceptions functions somehow
In the process of rewriting and simplifying word split functions
incorporate string cleaning functions in to a "cleaner" class, if that isn't already being handled by some of the nltk tokenising functions we're using everywhere