Performs simple preprocessing steps in a reusable function (hence I put it in the common directory).
make text lower case
replace shorthand phrases with their full form (e.g., {"won't": "will not"}).
remove remaining punctuation
The method can be easily extended to perform more steps if desired. It is in any case a good start.
I included a notebook that shows how to use it and shows some examples of issues we need to consider. Also added a notebook with high level EDA and a look into question length (which will be relevant for further preprocessing of the text).
Performs simple preprocessing steps in a reusable function (hence I put it in the common directory).
{"won't": "will not"}
).The method can be easily extended to perform more steps if desired. It is in any case a good start.
I included a notebook that shows how to use it and shows some examples of issues we need to consider. Also added a notebook with high level EDA and a look into question length (which will be relevant for further preprocessing of the text).