MLblog / jads_kaggle

Contains our group's work in various kaggle competitions
MIT License
10 stars 23 forks source link

Preprocessing the text data for the bag of words approach (feature creation) #117

Closed Christiannewisse closed 5 years ago

Christiannewisse commented 5 years ago

Adds features to the dataset by using the feature_adder (in the common directory). For example, number of bad words, number of words, number of question marks, etc.

It is now possible to give a list of badwords manually to the feature adder (since there is no external data is allowed in this competition).

Furthermore, in the notebook I explained how we can get a short list of badwords that are important for this competition.