Open kshaffer opened 7 years ago
Some examples of what's possible are in my personal GitHub repo.
FWIW, this should be a beginner-friendly project, but also open to more advanced algorithmic analysis.
I'm interested in learning R and think this is an interesting project.
@justinstimatze Excellent! I was able to scrape all of the GOP speeches, press releases, and campaign statements from January 2015 on and assemble into a single CSV, if that helps you explore: https://github.com/kshaffer/presidencyproject/blob/master/data/gop_2016_candidate_docs.csv
And if you're using this project to learn R, I highly recommend Tidy Text Mining. It's a free ebook explaining tools that might be helpful for this analysis.
This seems like an interesting project. Is it possible for me to join in on this project?
Hi Kshaffer,
I would like to join this project. I will be working on Pyhton. Is it possible for me to join this project?
@princeatul Thanks for your interest. All the things @kshaffer mentioned should be doable in python as well. If you're interested I would suggest grabbing the data linked above and try tackle one task from the list in the original post. Ex topic modeling
once you have a preliminary jupyter notebook open a PR to add it to the exploratory_notebooks section of this repo. I'm not an expert in this area but I do have this tutorial in my backlog that may help you get started.
If you need any help just visit us in #assemble channel on slack or post back here with any questions.
I don't see it discussed above, so I'll mention that FiveThirtyEight had a very interesting article on using latent semantic analysis for topic modeling reddit groups. Certainly an interesting starting point for those interested in seeing what might be done here.
Anyone interested in doing some basic word/n-gram analysis, topic models, etc. on presidential candidate speeches and press releases? Would be really interesting to see which candidates were/weren't plugged in to the extremist communities and when/where certain extremist language creeps into more mainstream campaign discourse.
An R notebook with instructions and code for obtaining this data from The American Presidency Project will be in the exploratory_notebooks folder soon (just submitted a pull request).