Consider the following two articles:
From the headlines alone, and even basic contextual knowledge of the current media environment, one can infer that the former article may offer less favorable coverage of this news story than the latter. Fox manages to squeeze an acknowledged win for Trump into a headline about the same length as the one from the Post. In fact, the headlines differ in several ways:
- The Post describes the North’s action with the word “says”, whereas Fox uses “announces.” In other words, the two sources are using synonyms to describe the same event, though the connotations of the terms may differ. On a broader scale, do news sources differ in their word choice when describing the same events? Does the source’s purported political leaning have any predictive power for this?
- The Post provides two distinct pieces of information: That the North is suspending tests, and that it is shutting down a test site. Fox provides the first piece of information (albeit with less information still, since the Post specifies both “nuclear and missile tests” whereas Fox only mentions “missile testing” -- this is a good example of similar, but not synonymous, phrasing), but completely omits the second piece of information in favor of quoting President Trump. The same questions apply here as for the first bullet.
- From the headlines alone, one may either infer that North Korea is temporarily suspending tests, or that it is permanently ending them. The Fox headline obviously supports the latter whereas the Post’s headline seems to imply that the announcement could be temporary. This is another example of similar-but-not-synonymous phrasing, this time with more significance for a reader’s comprehension of the news event since this is not an idle foreign policy question. Is it possible to not only identify these similar-but-not-synonymous phrases, but to also assess the significance of the phrase choice for the interpretation of the article? In particular, it would be interesting to be able to identify word/phrase choices that are predicted as most likely to impact a reader’s understanding of the news event.
- Speaking of foreign policy, the Fox headline goes out of its way to frame this news event as (what else?) a “win” for Trump. This news event is likely to raise at least two questions with regard to foreign policy: 1) whether the North is sincere or not, and 2) if so, whether the Trump administration’s actions can be credited for this admittedly favorable development. Obviously, some actors on the media stage have an established interest in how certain news events are covered; in this case, the Trump administration benefits from the framing of this as a win, on the supposition that this turns public opinion in their favor. The Post makes no such effort to frame the news event as favorable to the administration. What, beyond raw mentions, is a good measure of the relevance of an actor in the news? Can we identify actors (and their interests)? Can we identify whether a particular story is favorable or not for a given actor? In particular, we would be interested in knowing whether media sources tend to vary systematically in the favorability of their coverage for opposed political actors.
I just found a third article, this one by the NYT:
I also just saw an article on CNN, not related to this story, that referenced the Washington Post (as “WaPo” in the headline). Can we identify networks of news sources that tend to reference each other? Does this correlate with their political leaning, if any?
I think a good start to this would be to look only at headlines. This would have a number of advantages:
- Less, and less complex, storage required
- Faster, simpler scraping
- Faster analysis
- Can be our MVP
- Will get experience with scraping
- May be able to address some of the questions above