UChicago-Computational-Content-Analysis / Readings-Responses-2024-Winter

1 stars 0 forks source link

7. Deep Learning to Perform Causal Inference - Challenge #15

Open lkcao opened 8 months ago

lkcao commented 8 months ago

Post your response to our challenge questions.

First, pose a causal research question you would like to answer (in one, artfully worded sentence ending with a question mark). This could be the same question you posed for any prior week's assignment, or a new one that improves on or updates it, or shifts it to the counterfactual context. Second, identify a counterfactual prediction that will enable you to make your inference. This counterfactual prediction could simply be the question itself (e.g., How will the stock price change if the CEO reveals/admits an overstatement of earnings?) or it could support or validate the answer of that question (e.g., How do I predict whether the sentiment of a given sentence is positive or negative, certain or uncertain, resonant with U.S. Republicans or Democrats, about environmental position X or Y, etc. IF some prior event Z occurs). Finally, describe the datasets on which you will perform your counterfactual causal inference. Parenthetically note whether this data could be made available to class this week for evaluation (not required...but if you offer it, you might get some free work done!) Please do NOT spend time/space explaining the precise model or analytical strategy you will use to generate, evaluate and utilize your prediction. (Then upvote the 5 most interesting, relevant and challenging challenge responses from others).

sborislo commented 6 months ago

Causal Research Question: Does the language (e.g., words used) in prior video game reviews cause future reviews to be more likely to use that same language?

Counterfactual Prediction: How will the relative frequency of words change in reviews after the first three months if those words are used more frequently in those first few months? (i.e., will the differences in word frequencies be amplified?)

Datasets: For the counterfactual causal inference, this dataset scraped from Steam, will be used. This dataset includes the timestamps of videogame reviews and the text of those reviews.

yuzhouw313 commented 6 months ago

Causal Research Question: Does the language used in COVID-19 news report clips (used by reporter, interviewer, political figure, etc) influence the linguistic patterns observed in their comment sections, specifically in terms of tone, sentiment, and the use of specific terms?

Counterfactual Prediction: If the language used in COVID-19 news report clips had been neutral and fact-based, rather than emotive or sensational, will the comment sections also reflect a more analytical and less emotionally charged discussion, with a decrease in the use of alarmist terms and an increase in factual dialogue?

Dataset: The dataset of comments scraped from COVID-19 news report can be found here

chenyt16 commented 6 months ago

Causal Research Question: Does the political preference of each media source influence the language used in abortion-related news?

Counterfactual Prediction: If the political preference of each media source has changed, will the language used in abortion-related news also change? For example, if FOX News is a more natural or more Democratic leaned, will the language used in FOX abortion news also change?

Datasets: The dataset is the news outlets I scraped from BBC News and FOX News. I will scale up to more media platforms but I haven't done it. I can share the dataset but it's still small.

michplunkett commented 6 months ago

Causal Research Question: Does the language of organizational goals from the He_ritage Foundation and Fa_mlily Research Council influence the language of abortion related legislation introduced into the House of Representatives?

Counterfactual Prediction: If the language in the organizational goals of the aforementioned organizations were more oriented around the healthcare features around abortion than ending it, would that be reflected in the language of abortion related legislation introduced into the House of Representatives?

Datasets: The yearly goals of the HF and FRC going back to 1980 and all presented legislation from Republicans going back to 1980 that contains the words 'abortion', 'obstetrics', and 'fetus.'

bucketteOfIvy commented 6 months ago

Casual Research Question: Do posts with depressive sentiments on 4chan's /lgbt/ board prompt responses that are also depressive?

Counterfactual Prediction: Can the average sentiment of responses to a given post on /lgbt/ be predicted given the sentiment of the initial post?

Datasets: Posts scraped from 4chan's /lgbt/ board over the period of a week, with replies being detectable from the usage of reply tags.

ethanjkoz commented 6 months ago

Question: Do posts with negative sentiment towards adoption garner more support (higher post score) in adoptee-centric subreddits?

Counterfactual How do I predict whether the sentiment of a given post towards adoption is positive or negative given the user is an adoptee?

Dataset: same data set as previous weeks, augmented archive of r/Adoption posts and comments with more recently scraped posts from r/Adoption and also scrapped from r/Adopted (too small to be archived by the site I was trying to use).

beilrz commented 6 months ago

Causal Research Question: Can we infer about the effectiveness of alternative medicine, such as supplements or herbal medicine, from user reviews?

Counterfactual Prediction: If a herbal drug is effective at treating an condition, will it receive different comment from the consumers?

Datasets: Product page and user review scraped from Amazon pages for herbal medicine.

donatellafelice commented 6 months ago

Casual inference: Is it possible to predict a person's feelings of inclusion after a disagreement based on the specific words that are said?

Counter factual: if someone phrases their questions or answers a specific way in a disagreement, will it make their conversation partner have a more positive impression of the conversation?

Data: Candor corpus, and Booth experimental data (available on request from Dr. Risen/Wald at Booth).

naivetoad commented 6 months ago

Question: How the research interests of a researcher's academic output might change upon receiving funding? Prediction: If a researcher receives a funding, his/her research will shift to more popular areas. Data: abstracts of researchers' papers published three years before and after receiving funding

muhua-h commented 6 months ago

Question: Does different personalities cause LLM Agents behave differently? Counterfactual Prediction: difference in decision making is predicted by personality configuration Data: Simulation using llm

QIXIN-LIN commented 6 months ago

Research Question on Causality: Does feedback from the audience, such as "kudos" and comments (including negative ones), motivate authors to complete their projects?

Hypothesis on Counterfactual Impact: Does a "kudos" (a simple click on the website) have a similar influence on authors as comments (which require time to compose)?

Data Sources: The AO3 fanfiction dataset, encompassing metadata for fanfictions and statistics on reader feedback.

runlinw0525 commented 6 months ago

Causal Research Question: Are students in abstract disciplines, like economics and philosophy, more likely to use generative AI as a tool to refresh and explain their materials compared to students in concrete disciplines?

Counterfactual Prediction: What would be the impact on students' academic performance in abstract disciplines such as economics and philosophy if they were not encouraged to use generative AI as a tool to refresh and explain their materials, compared to a scenario where such encouragement is provided?

Dataset: All course syllabi published in or after 2023 from the biggest syllabi archive of the University of Michigan.

ana-yurt commented 6 months ago

Causal Question: Did ethnic tension and ethnic conflicts in the twentieth century cause an intensification of nation-building in the late Chinese empire? Counterfactual Prediction: If large-scale ethnic rebellions did not break out the way they did, would it result in any change in how Chinese people conceive the frontier and frontier populations? Dataset: digitized archival materials from the twentieth century

cty20010831 commented 6 months ago

Causal Question: Does the number of times of funding leads to less diversity of the research topics the awarded authors examine?

Counterfactual Prediction: If number of times of funding indeed leads to less diversity of the research topics the awarded authors examine, would there be any difference in the patterns for different fields of study?

Dataset: There are two parts of my dataset. The first one is the retrieved list of funded psychology NSF projects. The second one is the basic personal (e.g., title, schools/institutions they belong to) and publication-related information (e.g., citation count, h-index, and the title and abstract of the papers) of the funded authors to be scraped from Google Scholar by relating author information to the NSF funded list.

joylin0209 commented 6 months ago

Research question: Do school differences influence people’s misogynistic tendencies and vocabulary? Counterfactual prediction: The higher the school's ranking, the less likely students are to post misogynistic articles and messages on the Internet, and are less likely to use extremely negative related words? Data set: Taiwan's online forum Dcard. Users of this website register as a college or university when registering. Therefore, when posting a post, the school where the post is attended can be obtained, and then classified accordingly.

erikaz1 commented 6 months ago

Research Question: To what extent does the portrayal of individuals in historical records, such as personal journals and news articles, align with how they are remembered in collective memory, and what implications does any discrepancy hold for our understanding of history?

Causal RQ: Are lexicon relating to prevailing narratives about feminists (accounting for changing perceptions over time) more evident in relics of collective memory compared to personal records?

Counterfactual Prediction:

  1. How would the prevalence of lexicon associated with prevailing narratives about feminists change in relics of collective memory if personal records were not considered in shaping societal narratives?
  2. How would our perception of historical figures change if we relied solely on their personal journals and diaries as primary sources, rather than external historical accounts and narratives?

Dataset: FW Journal collection, interview data, 21st century news articles.

Caojie2001 commented 6 months ago

Research Question: Would the content of articles published by the central government influence the articles published by local governments in China? If there indeed is influence, is the influence heterogeneous among different local governments? Counterfactual Prediction: If the healthcare reformation agenda was not promoted by the central government, would local newspapers still pay attention to this topic? Dataset: Newspapers published by the central and local governments.

HamsterradYC commented 6 months ago

Causal Research Question: Does publicly sharing personal experiences and commitments to self-discipline on social media platforms influence the future content patterns and engagement levels of individual users?

Counterfactual Prediction: If users did not share posts about self-discipline, would their subsequent posts show less personal reflection and receive different engagement levels, indicating a potential causative effect of self-discipline discourse on user behavior and audience interaction?

Dataset: The dataset will consist of posts related to self-discipline gathered from the social media platforms Weibo and Reddit. It will include user attributes and the structure of their social networks.

YucanLei commented 6 months ago

How are games/genres classified based on their reviews (scrape tags from steam and subset to genre tags included in a pre-specified list)?

Carolineyx commented 6 months ago

Casual Research Question: Would how a couple meet each other story predicting their relationship longevity?

Counterfactual Prediction: Can the interestingness of the sentiment or the twists of the plots predicting whether the couple has a long lasting relationship?

Datasets: Wedding announcement articles from local news paper.

XiaotongCui commented 6 months ago

Causal Research Question: How does wealth influence the sentiment and tone of messages exchanged on dating apps, specifically OkCupid?

Counterfactual Prediction: How would the sentiment and tone of messages on OkCupid change if the wealth status of the sender were altered?

Datasets: The datasets would ideally include user profiles from OkCupid, containing demographic information such as income levels.

floriatea commented 6 months ago

Causal Research Question: How has the sentiment towards telehealth services changed across different countries during the COVID-19 pandemic? (Specifically, did countries with higher COVID-19 case counts experience a more significant shift in public sentiment towards the adoption and value of telehealth services?)

Counterfactual Prediction: If the COVID-19 pandemic had not occurred, would the sentiment towards telehealth services in countries severely affected by the pandemic remain more neutral or negative, compared to the observed positive shift during the pandemic? This counterfactual prediction seeks to explore the direct impact of the pandemic on accelerating telehealth acceptance and perceived value, distinguishing it from the general trend of digital transformation in healthcare.

Datasets for Counterfactual Causal Inference

  1. Telehealth Sentiment Data: Extracted from my NOW-telehealth dataset, focusing on the text, date, country, and sentiment-related columns (access to care, efficiency, effective,challenge... ). This dataset will be analyzed to assess public sentiment over time and across different countries.
  2. COVID-19 Case Counts: Publicly available data detailing the number of COVID-19 cases by country over time. This data would allow for the categorization of countries based on the severity of their pandemic experience.
  3. Global Digital Health Policy Data: Information on policies related to telehealth implemented by different countries during the pandemic, to control for governmental actions that might influence public sentiment and telehealth adoption independently of COVID-19 case counts.

The primary NOW corpus and COVID-19 case count datasets are readily available, and the digital health policy data may require aggregation from multiple sources. The combined analysis could be used for further exploration and validation, subject to data privacy and availability constraints it provides a rich source for analyzing sentiment towards telehealth on a global scale. By combining this with external COVID-19 case count data and digital health policy information, I can infer the causal impact of the pandemic on telehealth sentiment across countries. This approach allows for a nuanced understanding of how global crises can accelerate shifts in healthcare delivery models and public perception, providing valuable insights for policymakers, healthcare providers, and digital health innovators.