chocoluffy / redditQA

Explore some interesting NLP experiments with reddit comments data.
2 stars 1 forks source link

1-Predict-Topics

Run TF-IDF and LSI on existing subreddit comments, and given user's new comment, try predicting and recommending subreddit.

Predict Topics

2-PCA-Distribution-Plot

Build document-term matrix from BigQuery data, then run LDA to find topics distribution for each subreddit, and apply t-SNE dimension reduction with matplotlib visualization.

LDA visualization

3-Bipartite-Graph

Construct a bipartite graph between authors and topics, and propagate back and forth the labels to identify generalist/specialist among reddit authors for differnt community.

Bipartite Graph

4-LDA-On-TFIDF

Fine tune the model from week3, with TF-IDF weights applied on BOW matrix but keep in same magnitude.

Improved LDA

5-Model-Inspection

Examine the validity of models obtained from week4, and refine models by tuning hyper-parameters.

Model Inspection

6-Word2Vec

Apply non-semantic techniques(finding overlapping commenters), and semantic techniques(such as LSA, word2vec) to examine similarity between each subreddits.

Word2vec