Alicechung / ML_finalPJT

2018 Machine Learning Final Project
1 stars 3 forks source link

Meeting with Gulotty #8

Closed minjukim1220 closed 6 years ago

minjukim1220 commented 6 years ago
  1. Rather than making each speech as a unit of analysis, divide each of them into chunks, like by each paragraph (to increase accuracy of the analysis)

  2. When sorting out trade topics, randomly select 200 documents, hand-code trade vs. non-trade documents (preliminary supervised classification). Use that to classify the other documents (unsupervised classification).

  3. Research Design

Question: How do presidential speech differ in rust belt regions? Data: Every presidential utterance IV: region (rust-belt) **In this sense, the comparison with rust-belt and non-rust-belt is necessary DV: How do they talk Controls: economic issue salience (unemployment rate, GDP growth), the candidate being the incumbent or the challenger (dummy), relationship with the local governor (e.g. When the governor of IL is a Democrat, a Democratic presidential candidate is more difficult to say a bad thing about IL)

  1. Definition of Rustbelt From NY, delete New York City (Hillary's election base), Manhattan, Brooklyn, Bronx, Long Island, Coney Island

  2. Error with the dataset : There are duplicated documents!

(1) In the 2016 election--Hillary dataset, there are documents of 2007, which overlap with the 2008 election--Hillary dataset

(2) In the 2016 election--Mike Huckabee dataset, there are the documents of 2007, which overlap with the 2008 election speeches

  1. Erasing the words of the interviewers Example : [CUOMO: Amen say families all across the country, senator, but Hillary says the same thing, that she's for everyday Americans. Why Bernie Sanders and not Hillary Clinton?

SANDERS: Well, I think people have got to - first of all, let me tell you this, Chris. I've known Hillary Clinton for 25 years. I like her and I respect her. And I am running for working families in the middle class, not against Hillary Clinton. But I think people have got to look at the record.]

--> From R(or python), delete the content between [CUOMO: ~ Sanders:]

  1. Political Background Joe the plumber [Topic 43: Joe, Obama, tax, spread]

"Obama's response included the statement, "when you spread the wealth around, it's good for everybody." Obama's response was seized upon by conservative media, and by Obama's rival, Republican nominee Senator John McCain, as an indication that Obama was interested in the redistribution of wealth and had a socialist view of the economy. Wurzelbacher is a member of the Republican Party."

Drill baby drill [Topic 43: drill] Sarah Palin endorses Donald Trump, resurrects “drill, baby, drill” theme