UChicago-Computational-Content-Analysis / Readings-Responses-2023

1 stars 0 forks source link

2. Counting Words & Phrases - challenge #46

Open JunsolKim opened 2 years ago

JunsolKim commented 2 years ago

Post your response to our challenge questions.

Articulate a one-sentence computational linguistics hunch or hypothesis regarding the distribution of words, phrases or parsed claims within your corpus relative to some variable (e.g., time, city size, number of likes), between your corpora, or between your corpus and some linguistic baseline (e.g., all current Wikipedia articles; a sample of 2020 news articles; French tweets from 2016 Paris). This need not be critical to your final project...but it could lead there. Next, in a short (2-5 sentence) paragraph, describe why you reason this hunch or hypothesis might be correct. Finally, list the corpus or corpora on which you will test it, and mention whether it could be made available to class this week for evaluation (not required...but if you offer it, you might get some free work done!) Please do NOT spend time/space explaining how you will explore your hunch or validate your hypothesis with the mentioned corpus. (Then upvote the 5 most interesting, relevant and challenging challenge responses from others).

thisspider commented 2 years ago

I have a dataset of 200 letters in which billionaires promise to give away their wealth. In the letters, the authors describe their intentions and motivations. I also have data on the authors, including their net worth, place of residence, age, date of pledging, industry, and Forbes rank.

Hypotheses:

Higher age, source of wealth in traditional industries, and earlier date of pledging is correlated with use of moral words connected to family, religion, tradition, and social responsibility.

Lower age, source of wealth in new industries, and later date of pledging is correlated with use of language related to impact, efficiency, cost-benefit analysis, and return on investment.

Reasoning:

Over the past three decades there has been a shift in elite philanthropy towards effective, impact-oriented giving. This shift coincides with a demographic shift in the population of billionaires, who are now on average younger and got their wealth in technology or in finance. Here we would check if the change in language of philanthropy is related to the change in personal characteristics.

Corpus:

The billionaire letters are from the Buffet-Gates Giving Pledge, accessible here: https://givingpledge.org/pledgerlist

The variables describing characteristics of letter writers—net worth, place of residence, age, date of pledging, industry, Forbes rank—come from a website collecting data on transparency in philanthropy: https://glasspockets.org/philanthropy-in-focus/eye-on-the-giving-pledge/profiles/jared-and-monica-isaacman

The datasets could be made available to the class.

melody1126 commented 2 years ago

Hunch The division between the liberal arts and the vocational aspects of higher education (vocational guidance, placement, etc) widened from 1925 to 1950.

Reasoning The University of Chicago began to embody "life of the mind" and the liberal arts under President Hutchins. At the beginning of the radical liberal arts revolution, the University was not large enough and did not have as many ideological and practical divisions. The vocational guidance aspects of education, like career advising and placement for students, were interwoven with the liberal arts undergraduate studies. This division possibly widened in 1930, with the Great Depression, and the pre-professional aspects of education departed from the "Great Books" academic experience.

Corpus Magazine articles from Readers' Guide Retrospective and newspaper articles from historical newspaper databases (both available through lib.uchicago.edu). I will construct the corpus based on articles that mention "liberal education," "liberal arts" and study how often these terms appear together with terms of the vocational.

Corpus not yet completed, but happy to make available to class.

Jasmine97Huang commented 2 years ago

Hypothesis: Male music artists uses gendered-slurs more frequently than female artists in their music lyrics in the 1980s, while such discrepancy is shortened in the most recent decade.

Reasoning: A plethora of research has shown that popular music lyrics often depict women as submissive, inferior, nurturing, dependent, and sexual objects. Some scholars even claim that misogyny is a significant theme in popular genres like hip hop. Gender-directed insult can be read as a direct demonstration of intense emotion such as misogyny towards the recipients. Interestingly, scholars and pop culture consumers alike have noted an increase in the “reappropriation” of gendered insults. That is, women can effectively redefine the meanings of gendered insults by using those words in positive ways as neutral nouns or even compliment among in-groups.

Data: Billboard Top 100 and Spotify dataset on Kaggle (over 190000 songs from 1970s-2020s). Available to class

GabeNicholson commented 2 years ago

My hunch:

Reasoning: Conspiratorial thinking is always negative. You never hear of a conspiracy that some person or group is out to do good in the world by furtively helping people. There are good evolutionary reasons for why we indulge in conspiratorial thinking, and they are almost always about a powerful group or person doing harm to us in some way. The issue with this line of reasoning in our current society is that it is almost always false since our society is structured in such a way that large scale conspiracies are nearly impossible to pull off. The corpora to test this idea could be a set of controversial tweets on subjects such as vaccines and government policy and then could use this conspiratorial reasoning information as a way to set a prior probability of a tweet containing misinformation.

Data: Twitter tweets surrounding controversial topics.

ValAlvernUChic commented 2 years ago

My hunch:

Reasoning: The Singaporean government has largely made it clear that the entry of FDWs and migrant construction workers into the country is economically motivated. This sentiment is regularly communicated through parliament speeches or the state media (The Straits Times). I reckon that this discourse would have in turn molded the way many Singaporeans think about the transient workers in the country. However, FDWs are in a doubly-precarious position owing to 1) their gender 2) the implication of subservience that is often tagged to domestic work. With this in mind, I wouldn't be at all surprised if these attitudes were expressed in casual forum discussions.

Data: Collection of HardwareZone forum posts from threads with the words "FDW", "Maid", "Foreign Domestic Worker". Dataset not yet available.

isaduan commented 2 years ago

Hunch US policy discourses on sciences and technology have become more prone to "nationalism" frames that emphasize American jobs, American business's competitiveness, than a "neo-liberal, globalism" frames that emphasize development, peace, and shared future for mankind.

Reasoning STS scholars and scholars of political science have found that the making of science and technology policy is, as other policymaking, conditioned by socio-political factors such as partisanship and national power. As American grand strategy shifts from neo-liberal globalism to a more nationalistic frame of power competition with China, I expect the technology and science policy will also instantiate the discursive trends.

Corpus Congressional hearings from the US House of Representative Committee on Science, Space, and Technology, ideally from 2001 to 2021. Not yet available.

hsinkengling commented 2 years ago

Hypothesis: For hyperlinks used in scientific and/or political discussion comments: comments with links that lead to news sites and blog articles tend to be shorter, while comments with links that lead to peer-reviewed journals tend to be longer.

Reasoning: Peer-reviewed journals tend to contain more technical jargon and require more unpacking from the commenter in order to make a readable argument. News and blog articles are written for the larger audience, allowing the commenter the option to post the link along with a short description and successfully make a coherent argument.

(A follow-up would be to find whether there are more quotations in journal-linking comments than in news- or blog-linking comments)

Data:

The pushshift Reddit comment data Available to class (the later years' file size might be a bit large, but we could sample it, or just use the earlier years)

ZacharyHinds commented 2 years ago

Hypothesis: Incels relies upon the heavy use of in-community slang to build and reinforce both their worldview and their identity.

Reasoning: Involuntary celibates, better known as Incels, have established themselves in the public image due to their associations with acts of serious violence. They have developed and use a plethora of slang to describe the various aspects of their beliefs about both society and about themselves. This slang creates a clear in-group and out-group which may contribute to their isolate community and worldview.

Data: Incel Wiki which is a community wiki which is publicly visible online.

pranathiiyer commented 2 years ago

Hunch:

  1. Matrimonial advertisements in India tend to be extremely preferential towards caste and physical appearances of a certain kind i.e. they are extremely specific while publishing advertisements seeking partners (irrespective of gender).
  2. This trend continues and has not changed with time

Reasoning: The societal norm of acceptance of tall, fair, upper caste men and women has existed in certain strata of the Indian society for a long time. The tradition of seeking matches for matrimony through advertisements has been an extremely widespread practice and in fact, it still exists but has also moved to several e-platforms from newspapers and magazines. These advertisements are reflective of the social fabric of the regions they talk about, and can be deconstructed to further explore this.

Data- Plan on using the digital archive for matrimonial ads from 1998-2014 of the Tribune newspaper in India- https://www.tribuneindia.com/archive, and if feasible also explore data from advertisements posted on such websites.

konratp commented 2 years ago

Hypothesis: Members of the German parliament born in West Germany refer to the communist regime in the former German Democratic Republic (East Germany) in negative terms more than those born in East Germany do.

Reasoning: In the general political discourse in Germany, it is more than common to talk about the GDR in negative terms, pointing to its shooting of people at the Berlin Wall, the Stasi surveillance state, etc. However, many East Germans are fundamentally alienated in reunified Germany and hold positive nostalgic sentiments towards the GDR. West Germans are largely oblivious to this, and I believe this to be true even for West Germans who now represent East German districts (which is quite common -- Chancellor Olaf Scholz is one of them, alongside Foreign Minister Annalena Baerbock). If this is true, one can make the case that this dynamic plays into the political alienation of East Germans, among whom turnout is particularly low.

Data: I plan on using this dataset, containing all speeches given in the German Bundestag since 1990. I don't know how feasible it is to discuss this in class since the dataset is in German.

LuZhang0128 commented 2 years ago

Hypothesis: For online social movements (specified by a particular hashtag, e.g. #BLM), the topics/focus of interest change over time.

Reasoning: Different from traditional social movements, online ones can be spread more easily and quickly. Online social movements can also respond to new events better than traditional movements since people do not need to gather in person. Even after the success (based on definition but often times is when people achieve their political goals), people still use hashtags like #LGBTQ. The interest of the group, however, shift obviously, which could be tested by analyzing the words in Tweets.

Data: Twitter data with specific hashtags. Currently not available to class.

mikepackard415 commented 2 years ago

Hypothesis: In environmental discourse, words and phrases associated with fear, uncertainty, panic, and urgent calls to action have increased over the last 15 years.

Reasoning: Environmentalists have been sounding the alarm on climate change and environmental degradation for decades, and still economies and emissions continue to grow exponentially. Over time the sheer inertia of destructive systems becomes more clear, and for some the issue shifts from one political problem among many to an immediate existential threat. It is also possible that this trend could move in the opposite direction, as some of those who were sounding the alarm accept that their message will not be able to cut through the noise in time, and turn their efforts instead toward building healthy, happy communities that will be resilient to the coming upheaval.

Data: A corpus of >105,000 blog posts and articles sourced from Resilience.org, Grist.org, InsideClimateNews.org, and EMagazine.com. Can be made available.

Jiayu-Kang commented 2 years ago

Hypothesis Readers find positive reviews on Kindle books more helpful (measured by helpfulness rating) than negative reviews.

Reasoning While people usually read reviews help make their purchase decisions, their perceptions of the anonymous reviews may vary based on their content and sentiments. Since individuals' opinions toward a certain book are relatively more subjective than products only serving utilitarian needs, it is possible that readers tend to attribute negative reviews to nonproduct-related reasons (i.e. the reviewers' internal motivations), and therefore are less likely to trust the accuracy of negative reviews.

Corpus The dataset for 12,000 Kindle review text between 1996 - 2014 is available on Kaggle.

hshi420 commented 2 years ago

Hypothesis: Number of negative words in US media's news regarding Japan alone has been decreased significantly since 1939.

Reasoning: 1939 was the start of WWII, and the US and Japan were in hostile camps, and thus it is likely that the US news media reports news regarding Japan alone tend to use negative words. The news reporting Japan losing battles could have lots of postive words, but news reporting Japan alone (lifes, politics, business within Japan) could be negative back then. Nowadays, there have been great communications between the two nations, and thus the US news media might also change their attitude towards Japan.

Corpus: News articles about Japan from major US news media since 1939. Not available now. a

NaiyuJ commented 2 years ago

Hypothesis Ethnic minorities in China are generally satisfied with their life, the government, and other political institutions although they're economically advantaged because the Party penetrates ethnic regions with preferential policies, making minorities feel favored in daily life.

Reasoning Leveraging a national survey, I find that although ethnic minorities in China tend to be at the poor end of widening economic disparities, surprisingly, they perceive themselves as acquiring comparable or even higher social status with Han Chinese in general and revealing higher trust in the governments and other political institutions. They are more likely to believe that society is fair. These findings challenge our conventional understanding. I argue that the state has devised various institutional arrangements to co-opt ethnic minority groups and thus maintain political stability. One of the most important institutions is a range of preferential policies, which make minorities feel favored in daily life and prevent minority groups from falling further behind while facilitating integration.

Corpus I will do text analysis on the online discussion of different ethnic groups in their corresponding forums.

sizhenf commented 2 years ago

Hypothesis: China’s censorship program adopts selective censorship on criticisms. It has a high tolerance for critiques on public goods provision but has a low tolerance for critiques on leader's personalization.

Reasoning: I suggest there is an “authoritarian bargain” between the leader and the citizens. Intuitively, the leader provides the citizens with welfare, including material benefits and their preferred public policies, in exchange for more centralized political power. Thus, the leader needs a low censorship rate on public goods related issues to estimate better the public’s preferences and a high censorship rate on political matters to centralize power.

Corpus: Censored and uncensored posts related to several representation cases from Chinese social media (Sina Weibo and freeweibo)

Hongkai040 commented 2 years ago

Hunch: The frequency of mentioning a specific word/N-grams with temporal attribute (e.g. year '1951') largely scales with an inverse power-law.

Reasoning: In this week's orienting paper " Quantitative Analysis of Culture Using Millions of Digitized Books", there are some figures showing the frequency of some n-grams mentioned in the books decrease overtime.(e.g., Fig.3 A). The authors only listed three years(1883, 1910, and 1950) and only said that "We are forgetting our past faster with each passing year". But these curves looks like power law distributions with different arguments! Another paper(Diachronic Word Embeddings Reveal Statistical Laws of Semantic Change) I read before proposed the law of conformity—"the rate of semantic change scales with an in-verse power-law of word frequency". Hence I think it's possible that the frequency of mentioning a specific word/N-grams with temporal attribute scales with an inverse power-law.

Corpus: Ideally, we can use Google Ngrams dataset( or a portion of it). https://storage.googleapis.com/books/ngrams/books/datasetsv3.html

Emily-fyeh commented 2 years ago

Hypothesis: In Taiwanese multimedia, terms commonly used or trending in China become more prevalent in the recent decade, due to the rapid development of the entertainment industry in China.

Reasoning: Since the 2010s, the entertainment and pop music industries in China have experienced explosive growth in response to domestic demand. TV stations of many provinces started to exert their influence on other Chinese-speaking/surrounding countries through replicating the successful K-Pop model. By uploading their programs on YouTube and taking advantage of the viral Tik-Tok, the culture export is also regarded as a part of Chinese propaganda. The audience in Taiwan can easily access these contents and represent the cultural cues in their daily life.

Corpus:

  1. The trending words retrieved from the hot search ranking of Chinese entertainment websites.
  2. The content of Taiwanese traditional media and social media in Taiwan. May need a clearer specification to scrape the target contents.
MengChenC commented 2 years ago

Hypothesis: The descriptions and requirements to beauty, appearance, physique, decoration are increasing (or at least monotonically, if not strictly) through the timeframe, and applicable for all genders.

Reasoning: As society becomes wealthier, people have more time, energy, and resource to locate their attention rather than on simple routines and livings. The emergence of aesthetic industries has also been shifting and modeling people's feelings and consciousness of their looking. From the saturation of product advertisements to the brain-washing-like promotions and propaganda, the industries redefine people's aesthetics in modern society. Also, with the widespread internet and influencers, the mindset of pursuing beauty sometimes can even be morbid. An overview and comparison on now and past will be interesting.

Data: A corpus of 100,000 blog posts, newspaper databases, e-Magazine websites, etc. can be the resources for comparison.

Qiuyu-Li commented 2 years ago

Data used for my final project: Tweets that are identified to be sexist.

Hunch: Terms or phrases in tweets that are identified to be sexist may evolve over time.

Reasoning: (1) I'm not very familiar with Twitter, but based on my observation of Sina Weibo (the counterpart of Twitter in China), a hotly discoursed topic like sexism includes some fasionism: People are most interested in a topic when there's a hot event, and discussing sexism/misogyny/feminism becomes a fashion. As the discussion evolves, people learn from previous examples, create new jargon, so that the language itself also evolves. (2) It's also because of the feature of social media language itself: it likes fashion words. I always get the feeling that I can't understand what teenagers in Chinese social media is talking about. They use abbreviations, pinyin, and dialect, which constructs part of the self-identification: They are the master of today's social media world. Therefore, I expect that the more a topic occurs in public discourse, the more the terms it involves become colorful, variable, and changes rapidly.

Data and examination: To identify how sexism-related terms and phrases evolve over time, I will need to scrape some tweets with a sufficient time span, then try to identify sexism-related posts either by manual tagging and then feed to the machine learning algorithm, or use an existing algorithm. Then I would need to figure out the difference between these posts across time, perhaps through a combination of tf-idf and bag-of-world model. This would cost a lot of time.

chentian418 commented 2 years ago

Hypothesis: Outside analysts would make positive revisions on earnings forecasts for a firm on average if the concurrent news are more positive directly or indirectly about the firm. More uncertain news about macroeconomics also generate more accurate revisions, less deviate from the realized earnings. Reasoning: Market sentiment is a prominent factor valued by firms and analysts. When evaluating a firm's earning ability, they tend to extract the value relevant information from the financial news (e.g. Wall Street Journals) besides the fundamental information. Macroeconomic uncertainty and cyclicality jointly affect management and analyst forecast(Yang and Chen 2021). Specifically, increasing macroeconomic uncertainty reduce the tendency for management to issue earnings forecast, which are heavily driven by firms with low cyclicality. When macroeconomic uncertainty is high, analysts issue more accurate earnings forecast than management for firms with high cyclicality. Therefore, I plan to develop measures of market sentiment and Macroeconomic uncertainty from the corpus to contextualize these measurements. Corpus: analyst forecast data are collected from I/B/E/S; Articles from Wall street Journals collected from ProQuest TDM Studio.

Sirius2713 commented 2 years ago

My hunch: Tweets of former president Trump usually went viral, which means the public will be easily impacted by his tweets so that his tweets may impact the capital markets.

Reasoning: Twitter provides a platform for the public and the government leaders to interact seamlessly. Therefore, political celebrities like Trump will be able to covey his ideas about some companies to the public directly through his tweets. Consequently, the reaction of the public will be shown in the capital markets, the stock prices to be more specific.

Data: Twitter data from Trump and other political celebrities

sudhamshow commented 2 years ago

Hypothesis: One can make an early forecast of the performance of a stock based on the sentiment of users on Social Media.

Reasoning: Potential investors and advisors are always on the lookout for news pertaining to the performance of the stock and what factors could influence those. Social Media platforms are also filled with impulsive users who react to every slight variation in the price of the stock and constantly tweet about it. The most recent data seems to always be available even though it would be a little difficult to find the right counsel for action. Analysing words with certain hashtags and considering weighing the content of the message based on the prior experience and validity of the user, one can catch on early with the trend of a stock price.

Data: Messages from particular subReddits, tweets with particular hashtags and similar sources with esoteric information.

zixu12 commented 2 years ago

Hunch: The numeric review online is inflated, and the reviews from customers and communication messages with the sellers can be good complements to measure the quality of the goods and sellers.

Reasoning: Past literature shows that the numeric reviews are inflated (Jie et al, 2020), thus it might be good enough to measure the quality of the goods. There are also rich text resources - the reviews by the customers, which should be real and can be used to measure the quality of the goods. Another text resource can be obtained by communicating with the sellers directly and can detect how soon and what they reply, which are 100% real and can be used to check the reviews of the sellers.

Data: Web scrapping data from e-commerce. The communication data is available, but the review data still need to be collected.

YileC928 commented 2 years ago

Hunch: Employee reviews would be more informational and complex if they work longer in the company. Negative employee reviews tend to be longer than positive reviews.

Reasoning: An employee tends to learn more about the company and would encounter more stories/incidents after spending a long time working for it. They might thus be able to post more detailed content. When criticizing, people might be more eager to explain details and list examples to support their claims.

Data: Employee reviews of S&P 500 companies collected from Glassdoor

kelseywu99 commented 2 years ago

Hunch: Traditional wife influencers' veiled message on femininity actually appeals more to people who subscribe to alt-right beliefs.

Reasoning: Traditional wife influencers usually write and talk about traditional femininity, which ranges from how to act femininely and not act in a masculine way on a surface level, to topics such as to conform to traditional gender roles, how to treat their partner/husband in a feminine way, and why should one be proud of their (European) heritage. While those content may appeal to some women who wish to be a feminine housewives, there is this large overlapping with alt-right ideology, i.e. gender roles, white pride, anti-egalitarianism, etc.

Data: blogposts and tweets by traditional wife influencers and analysis on common keywords used by alt-right leaders.

AllisonXiong commented 2 years ago

Hypothesis: the gender difference in writing is becoming less discernible.

Reasoning: There are long-standing stereotypes about femail writers, that they focus on family, romantic relationships and maternal stuff, while male writers focus on more scientific and/or grand themes; that their wordings tend to be recognizable as men are more daring, logical, and have no problem using sex-related words. As the improvement in gender equality, however, there're now more female writers and have a larger in-group variation. There are female write about science, war, alchohol and sex. It's reasonable to assume that gender signals have become less discernible across time.

Data: the datasource mentioned in the orientation reading that contains about 4% of all books ever printed and metadata about their authors.

chuqingzhao commented 2 years ago

Hypothesis: startup's pitches and pivots become more concrete over time, but pitches that try to differentiate themselves from the markets might not necessarily lead to future investment. Differentiated strategy in business communication would cause future success if their pitches are structurally balanced what we have already known (such as using analogy) and what consumers' expectations for new products.

Reasoning: There is a learning process for startup companies. In the beginning, startups may only have a broad or novel concept. Over time the startups learn from the investors and markets by adapting their expectations. From their self business description, companies would add more words to describe their products based on what they learn from the markets. Intuitively, novel products or novel ideas can be attractive to venture capitalist, but not every novel idea make senses from the perspective of investors. Sometimes, their idea might be too "novel" or too difficult to understand.

Data: Crunchbase business description.

ttsujikawa commented 2 years ago

Hypothesis Casual connection between spending behaviors and sentiments expressed on social media becomes clearer in the case of the cash handout in Japan. Reasoning The financial support policy was extensively discussed among citizens on various platforms and Twitter was not an exception. Consumers became relatively more expressive not only about the sentiment to the policy but also plans of how they were going to use the money. Also, though users on Twitter are distributed broadly among all generations up to 50s, we could assume variations in the response among generations and types of consumption.

Data Bank of Japan, Twitter

weeyanghello commented 2 years ago

Hypothesis: For tweets related to K-pop, the proportion of tweets that are used to "game" Twitter algorithms to generate popularity rather than merely spark discussion has increased over time, reflecting how the success of K-pop artists is increasingly tied to social media metrics.

Reasoning: In my research on K-pop fandom organizations through social media, social media is not merely a "platform" for discourse, but a human-machine interface that grants K-pop communities agentive power to direct and generate discourse about the causes they support, causes that are not simply limited to idol worship.

Data: Population of tweets that are related to K-pop

yuzhouw313 commented 5 months ago

Hunch: Given similar popularity and content volume during the outbreak of COVID-19, videos from conservative news channels (e.g. Fox News) will exhibit a higher propensity for the presence of Sinophobic terms in the comment section. In contrast, videos from liberal news channels (e.g. MSNBC) are expected to demonstrate a more diverse spectrum of attitudes toward China and Chinese.

Reasoning: This linguistic hunch is based on the well-established result that the political orientation of news channels can shape the tone and content of user-generated comments. Conservative news channels may foster an environment where Sinophobic sentiments find resonance due to their editorial stance or audience composition. In fact, after manually browsing through Fox News videos about COVID-19, I noticed that there were many news clips depicting the pandemic as a bioweapon conspiracy or explicitly condemning China with inflammatory terms, which might catalyze a Sinophobic dynamic within their comment sections. In contrast, liberal news channels, known for their emphasis on diversity and inclusion, are expected to host a broader range of attitudes in their comment sections.

Corpus: A collection of comments scraped from YouTube news channels (both conservative and liberal) using its official API.

joylin0209 commented 5 months ago

Hypothesis: Among the three political parties in Taiwan's 2024 presidential election, in their public statements, the Democratic Progressive Party emphasized "Taiwan sovereignty" the most, the People's Party emphasized the "violent struggle between blue and green" the most, and the Kuomintang emphasized "ruling party alternation" the most.

Reasoning: In the past, Taiwan's political environment was dominated by two major parties: the Democratic Progressive Party and the Kuomintang. Most of the former group members have promoted social movements and emphasized democratic values ​​in the past, with particular emphasis on "resistance to China" and "Taiwan sovereignty." Therefore, it is speculated that "Taiwan sovereignty," as the party's core value, should be particularly emphasized before the election. Since the Democratic Progressive Party was re-elected as president four years ago, the Kuomintang, one of the two major parties, should emphasize "party rotation." In recent years, more and more young people have become disillusioned with the two major parties and have turned to smaller parties, one of which is the People's Party. Therefore, it is expected that the People's Party will stand in the original position of the middle voters and criticize the blue-green camp for being just as bad. (Blue = Kuomintang, Green = Democratic Progressive Party)

Data: The data sources are public posts and external statements of the three major parties in the past six months.

volt-1 commented 5 months ago

Hypothesis: In online book club discussions, the prevalence of assertive language and self-referential terms (e.g., "I believe", "In my view") increases in posts that spark more interactive discussions (evidenced by higher reply/likes counts), especially in threads discussing controversial or ambiguous book endings.

Rationale: Assertive language and personal viewpoints might encourage more engagement in discussions, particularly in environments like book clubs where diverse interpretations are valued. The use of assertive and self-referential language in discussing complex or debatable topics, such as controversial book endings, could stimulate others to respond with their own opinions, thereby increasing the interaction within the thread. These linguistic features might serve as catalysts for deeper, more engaging conversations.

Corpora options: Goodreads Book Club Discussions Dataset: A collection of discussion threads from various book clubs on Goodreads, categorized by the book genre and the number of replies to each post. The CONLIT Dataset of Contemporary Literature: Posts and threads from popular online literary forums, focusing on discussions about books with controversial or ambiguous endings.

sborislo commented 5 months ago

Hypothesis: Words associated with violence (e.g., "war") would be more likely to be found in game titles with larger player bases.

Rationale: A key factor affecting people's motivations to play video games is engagement. Real life can often be boring or depressing, so video games are frequently used to turn one's focus away from real-life happenings. Violent media content has been shown to, on average, be more engaging than non-violent media content, so violent associations likely signal more engaging games. Additionally, violent environments have been shown to help fulfill certain needs, like autonomy (feeling in control of one's life) and comradery (since violent games often involve needing help and/or helping others). This likely contributes to the appeal of violent connotations as well.

Data: The steamcharts.com webpage, which tracks player counts and game titles.