Open HyunkuKwon opened 3 years ago
I have reservations about the results of this paper. First, I used to use Google Translate a lot for a while, but its translation results are really very disappointing. Unfortunately, the authors’ study is based on this... Second, I don't think “counting words" is credible in this case. For example, the phrase "the past happiness is fading away" actually expresses sadness, but would be recognized as happiness.
Another thing is that I was actually trying to find Japanese in Figure 1. Japanese is the ninth most spoken language in the world. Why didn't the authors include Bengali and Japanese, which are the top 10 languages spoken in the world?
@Raychanan I had really similar thoughts when reading this - that counting words is not doing what they intend here - they decided to keep (rather than stem/lemmatize) all words which offered potential for tracking things like the example you provided but it doesn't appear that they capitalized on that potential.
I would argue that words that they assume to signal happiness (even when negated) are actually signalling the value of happiness to the population in this sample (authors of the books or so). While a high raw frequency of the term 'happy' might not indicate that the authors/stories in the English Google Books corpora are happy, it does indicate a cultural significance of happiness (via its frequency in discussion/usage). Thus, a takeaway from this article could be that across languages, happiness (/the implication or evaluation of happiness) is thematically and, arguably, culturally significant.
I am always very skeptical about word-level semantic analysis. Languages are complex by the merits of recursive syntactical structures and convoluted inferential pragmatics. For example, simply adding together the meaning of words will not give you the meaning of a sentence. Adding together sentences will not give you the meaning of an article. In their cases, negative valence might be conveyed in positive words (sarcasm, for example), and their analysis would not be able to properly account for that.
This is a very interesting study. The authors carry out word counting using large corpus for different languages. They then rely on human annotation about the positivity of common words across different languages. I am curious how these human rating compares to some predefined sentiment dictionary, such as Harvard Inquirer Dictionaries?
The paper makes claim about language as a tool for communication in all forms, yet the data they analyzed are only languages in written form (Twitter, books, TV show script etc). I wonder will our daily conversations also have this positive bias? Maybe the researchers can record everyday conversations, transcribe them into text, and extend this study to see if the positive bias remains.
I share the concerns others have noted regarding the lack of attention to word context in this kind of methodology. Has there been any research on whether this word-level analysis corresponds with people's subjective interpretations of text? For example, whether the literature "happiness time series" correspond with reader interpretations of books' emotional material?
My concern with this paper is that researchers chose to use the frequency of words to represent its importance. As we have learned in how to extract ngrams or phrases from text data, will sentimental analysis of phrases or even sentences be more reliable, than words?
The derived hedonometer seems work well with the realm of literature, based on the results. Yet, I wonder if it will still be effective measurements for other types of corpora, such as comedies, where irony is often used.
A question pertinent to that of @william-wei-zhu:
The authors claim that they took 24 varying corpora from 10 languages. However, the selection of corpora varies across every language the author is evaluating. For example, for Indonesian, the authors only used movie subtitles and Twitter; for Chinese, the authors only used Google books as their source. What about Indonesian books and Chinese social media? I find this uneven, or even biased, selection of corpora very problematic.
The conclusion on the universal positivity bias is interesting, but are there empirical results from other disciplinaries such as cognitive science that can be used to crosscheck this result?
I would say that people use positive words more, even in negative expressions. For example, I would prefer "not (that) happy" to "unhappy" when I am in a bad mood. In this context, I guess humans will get my emotion right. I am wondering if there is less positivity bias in language comprehension compared to language use. More importantly, can we study language (or content) comprehension using content analysis?
My question is also a about the selection of 'word' as a unit of measure. It would be interesting to see if the results would be the same when those words are used in context. Would the positivity bias approach 0 or would it persist?
Post questions here about the following exemplary reading:
Dodds, Peter Sheriden et al. 2015. “Human language reveals a universal positivity bias.” Proceedings of the National Academy of Sciences 1112(8):2389–2394, doi: 10.1073/pnas.1411678112