issues
search
dsp-uga
/
andromeda
This repository contains a Naive Bayes classifier implemented on document classification which is completed on CSCI 8360, Data Science Practicum at the University of Georgia, Spring 2018.
MIT License
4
stars
1
forks
source link
GitHub-wiki
#30
Closed
melanieihuei
closed
6 years ago
melanieihuei
commented
6 years ago
Wiki
melanieihuei
commented
6 years ago
Environments Setting
Python
Apache Spark
Google Cloud Platform
Natural Language Processing
Scalable Document Classification
Bag-of-words model
preprocessing
stopwords
stemming
punctuation
tf-idf, hashing
algorithms: NB, KNN, LR, random forest
Naive Bayes
Theory
conditional probability
Bayes Theorem
naive?
NB classifier
drawbacks & solutions of NB
NB in document classification
Optimization
features: hashing, tf-idf(question) instead of counts
laplace/laplacian smoothing
L1 regularization
Logistic Regression
Theory
LR in document classification
drawbacks and solutions
KNN
Theory
LR in document classification
drawbacks and solutions
Random Forest
Theory
LR in document classification
drawbacks and solutions