arendakessian / spring2020-ml-project

fake review detection system
2 stars 3 forks source link

Pre-vectorization feature engineering #1

Closed kelseymarkey closed 4 years ago

kelseymarkey commented 4 years ago

We had discussed engineering features for number of exclamation points, and number (or percent) of words in all caps. This needs to be done before vectorization (which removes punctuation and capitalization), but can be done for each review then added at some point to the vectorization pickle that is already completed.

kelseymarkey commented 4 years ago

Done, see commit 952d396cce353a5712153491f120dd18620b39c5

Each team member should run the notebook locally and save output to their local.