This repository contains a Naive Bayes classifier implemented on document classification which is completed on CSCI 8360, Data Science Practicum at the University of Georgia, Spring 2018.
Package used
nltk (stopwords,stemming<lancaster,porter...>)
string.punctuation
Algorithm
Data Structure
input data structure-> preprocessing data structure -> output data structure
Overview
preprocessing
NB (build the model)
LR(???)
*Make prediction
Environment setup Python Apache Spark *Google Cloud Platform
Package used nltk (stopwords,stemming<lancaster,porter...>) string.punctuation
Algorithm Data Structure input data structure-> preprocessing data structure -> output data structure Overview preprocessing NB (build the model) LR(???) *Make prediction
Result *Accuracy
Furture Research *KNN (link to wiki)