Data4Democracy / assemble

NOT AN ACTIVE PROJECT -- Check readme for data sources
MIT License
36 stars 27 forks source link

tokenize and analyze 2016 presidential candidate rhetoric for comparison with extremist communities #52

Open kshaffer opened 7 years ago

kshaffer commented 7 years ago

Anyone interested in doing some basic word/n-gram analysis, topic models, etc. on presidential candidate speeches and press releases? Would be really interesting to see which candidates were/weren't plugged in to the extremist communities and when/where certain extremist language creeps into more mainstream campaign discourse.

An R notebook with instructions and code for obtaining this data from The American Presidency Project will be in the exploratory_notebooks folder soon (just submitted a pull request).

kshaffer commented 7 years ago

Some examples of what's possible are in my personal GitHub repo.

FWIW, this should be a beginner-friendly project, but also open to more advanced algorithmic analysis.

justinstimatze commented 7 years ago

I'm interested in learning R and think this is an interesting project.

kshaffer commented 7 years ago

@justinstimatze Excellent! I was able to scrape all of the GOP speeches, press releases, and campaign statements from January 2015 on and assemble into a single CSV, if that helps you explore: https://github.com/kshaffer/presidencyproject/blob/master/data/gop_2016_candidate_docs.csv

And if you're using this project to learn R, I highly recommend Tidy Text Mining. It's a free ebook explaining tools that might be helpful for this analysis.

ghost commented 7 years ago

This seems like an interesting project. Is it possible for me to join in on this project?

princeatul commented 7 years ago

Hi Kshaffer,

I would like to join this project. I will be working on Pyhton. Is it possible for me to join this project?

bstarling commented 7 years ago

@princeatul Thanks for your interest. All the things @kshaffer mentioned should be doable in python as well. If you're interested I would suggest grabbing the data linked above and try tackle one task from the list in the original post. Ex topic modeling once you have a preliminary jupyter notebook open a PR to add it to the exploratory_notebooks section of this repo. I'm not an expert in this area but I do have this tutorial in my backlog that may help you get started.

If you need any help just visit us in #assemble channel on slack or post back here with any questions.

mw0 commented 7 years ago

I don't see it discussed above, so I'll mention that FiveThirtyEight had a very interesting article on using latent semantic analysis for topic modeling reddit groups. Certainly an interesting starting point for those interested in seeing what might be done here.