Do a litterature review of NLP in PolEcon and converge on a technique

Houdanait commented 7 months ago

The goal of this issue is to create a literature review of NLP techniques, and those that have been used in Political Economy to converge on a technique for our project.

Houdanait commented 7 months ago

Literature Review Doc is here.

Houdanait commented 7 months ago

Here is the state of the literature (please Matt, @juliettecoly add to this).

Ideological Scaling of Social Media Users: A Dynamic Lexicon Approach (here)
Automatic Detection of Political Opinions in Tweets (here)
Political Ideology Detection Using Recursive Neural Networks (here)
Text Classifiers for Political Ideologies (This was a CS224N projecrt which published with citations, we should aim to improve on this - here)
A Survey on Political Viewpoints Identification (here)
Fine-Tuned BERT for the Detection of Political Ideology (Another CS224N project - here)
A Machine Learning Pipeline to Examine Political Bias with Congressional Speeches (here)
From Pretraining Data to Language Models to Downstream Tasks: Tracking the Trails of Political Biases Leading to Unfair NLP Models (here)

Not an article, but a relevant blog post with dataset

Predicting Political Affiliation with Natural Language Processing (here)

matthewcwise commented 7 months ago

Using Natural Language Processing to Analyze Political Party Manifestos from New Zealand (here)
A scoping review on the use of natural language processing in research on political polarization: trends and research prospects (here) - great overview intended to be a "starting point for future research"
Mass Polarization: Manifestations and Measurements (here)
Foundational text on polarization: Have American's Social Attitudes Become More Polarized? (here)
Political Polarization in the American Public (over 1K citations) (here)
Word Embeddings for the Analysis of Ideological Placement in Parliamentary Corpora (here)

Houdanait commented 7 months ago

Hi @matthewcwise ! I've added a few references here. Do you have any papers to suggest? Also, in my sense, we will need some sort of labeled data right to train the model? what do you think? Or are you thinking of inferring polarization/extremism simply from the space embeddings of words? If so do you have an idea of how we would do it!

matthewcwise commented 7 months ago

Hey Houda, thanks for adding some more! I had a really interesting chat with Prof. Manning today--from a career advice perspective, he told me it would be nice but not essential to find some interesting CS/linguistic theory to include into our paper. He also also suggested that it might be hard to find novel work looking at evaluation political views from text.

I brought up the idea of exploring pragmatic markers and he thinks there could be some interesting political findings we could explore there while still doing a little bit with theory. Pragmatic markers are the family of words that provide a function ("doing" something in a sentence rather than conveying meaning). Examples include phrases or words like, "You know", "well", "literally". Every language has hundreds of these and they frequently shift in meaning.

As an example: do politicians use pragmatic markers to make themselves more relatable, reveal underlying beliefs about policies/platform (what do they actually believe vs. what is just them toeing the party line), etc. Trump is an extreme example.

If you're open to exploring pragmatic markers, he gave me a professor in the linguistics department to contact and I've drafted an email that I want to send tomorrow after I've been able to review a few more papers on pragmatics. I can cc you on the email as well.

Let me know what you think about this--it's a bit of a shift from trying to process every element of a speech/document and instead focuses on a niche part of linguistics that hasn't been explored as much. I think it might give us a better shot at publishing and also discovering something novel. I could also tell that Prof. Manning got excited as we started to explore some of these ideas, which makes me think it's got some good potential.

Houdanait commented 7 months ago

Hi @matthewcwise thank you so so much for the post above! I totally LOVE the idea of exploring pragmatic markers, i think this is actually so interesting ! I'm definitely on board with this! I hope the meeting went nice with Prof.Manning :)

matthewcwise commented 7 months ago

Hey, I don't have access to the post you linked above so I'm adding in lit review notes here:

Manipulative uses of pragmatic markers in political discourse - "a single pragmatic marker can serve several manipulative functions, while a given manipulative strategy is potentially realized by a variety of pragmatic items." The Analysis of Translated Hedges in Trump's Political Speeches and Interviews - Hedges can be one of the most important linguistic phenomena because it can be widely used. "Hyland (1998, p. 428), claimed that hedging has considered to be the best indication 'of an unwillingness to make a complete commitment to the truth of a proposition, most particular regarding new knowledge'" Hedging in Doctor-Patient Communication: A Pragmatic Study - To use hedges properly can strengthen expressive force and communicative results, which can improve interpersonal relationship and thus make communication go more smoothly. "Schȁffner, C. (1998). Hedges in political texts: A translational perspective" from Schȁffner, C. (1998). Hedges in political texts: A translational perspective. In L. Hickey (Ed.), The pragmatics of translation. GB: Cornwell Potts (the professor Prof. Manning recommended we reach out to) A probabilistic pragmatics for English singular 'some' "We identify two challenges for semantic accounts of singular some."

matthewcwise commented 7 months ago

Ok here's how I'm thinking about our project. It's 2 AM so it's not the most coherent but I'll come back and re-word:

Goal: compare scripted vs. unscripted speeches for politicians, evaluating how modals, pragmatic markers are incorporated. The goal would be to see if we can identify meaningful differences in belief/conviction/policy stances? (i.e. they might "hedge" in unscripted situations on policies that they have to support but they don't personally believe in).

Data: MediaSum: A Large-scale Media Interview Dataset for Dialogue Summarization

Elements we could analyze:

Analysis of Spontaneity vs. Scripted Speech: Pragmatic markers often indicate spontaneity in speech. For example, markers like "um" and "ah" might suggest the speaker is formulating thoughts on the fly, which could indicate genuine belief or personal opinion. In contrast, a lack of these markers might suggest prepared or scripted speech, aligning more with party lines or rehearsed responses. Emphasis and Certainty: Markers that add emphasis (e.g., "indeed", "certainly") or express certainty (e.g., "definitely", "absolutely") might be used to bolster statements that align with strongly held beliefs. The frequency and context of these markers could provide clues about the speaker's commitment to the content of their statements. Contrastive Markers: The use of contrastive markers like "however" or "on the other hand" can indicate a deviation from a simple party line, especially when used to introduce a more nuanced or personal viewpoint that differs from typical party rhetoric. Sentiment Analysis Enhancement: Integrating pragmatic markers into sentiment analysis models could help determine the emotional tone behind statements, distinguishing between genuine passion or indifference which might reflect personal belief versus party rhetoric. Linguistic Style Matching: Comparing the linguistic style of a politician in different settings (e.g., formal debates vs. informal interviews) using pragmatic markers could reveal shifts in style that suggest when they are speaking 'off-script'.

Questions: Do we need to do audio analysis? Transcriptions would be a good starting point, but we might need to somehow consider audio.

Houdanait commented 7 months ago

Hi @matthewcwise thank you so much for all of this. Sorry I've been lagging behind, I had very limited connection. Would you be willing to do a call Monday or Tuesday? I can work around your times.

matthewcwise commented 7 months ago

No worries, things will balance out over time, I'm not stressed about it! I'm free to chat Monday from noon-2 PM Pacific or in the evening after 9 PM--do you have a time that works best?

Hope the travel/research has gone well!

Houdanait / PoliticalTextandAttitudes

Do a litterature review of NLP in PolEcon and converge on a technique #2