Week 5. Apr. 19: Transformers and Social Simulation - Possibilities

Thinking-with-Deep-Learning-Spring-2024 / Readings-Responses

You can post your reading responses in this repository.

0 stars 1 forks source link

Week 5. Apr. 19: Transformers and Social Simulation - Possibilities #10

Open JunsolKim opened 3 months ago

JunsolKim commented 3 months ago

Pose a question about one of the following articles:

“Generative agents: Interactive simulacra of human behavior.” Park, Joon Sung, Joseph O'Brien, Carrie Jun Cai, Meredith Ringel Morris, Percy Liang, and Michael S. Bernstein. 2023. UIST.

“Out of one, many: Using language models to simulate human samples.” Argyle, Lisa P., Ethan C. Busby, Nancy Fulda, Joshua R. Gubler, Christopher Rytting, and David Wingate. 2023. Political Analysis.

“Simulating social media using large language models to evaluate alternative news feed algorithms.” Törnberg, Petter, Diliara Valeeva, Justus Uitermark, and Christopher Bail. 2023. arXiv.

“Jury learning: Integrating dissenting voices into machine learning models” Gordon, Mitchell L., Michelle S. Lam, Joon Sung Park, Kayur Patel, Jeff Hancock, Tatsunori Hashimoto, and Michael S. Bernstein. 2022. CHI.

“Improving factuality and reasoning in language models through multiagent debate” Du, Yilun, Shuang Li, Antonio Torralba, Joshua B. Tenenbaum, and Igor Mordatch. 2023.

Pei0504 commented 2 months ago

The research “Out of one, many: Using language models to simulate human samples.”explores how language models like GPT-3 can simulate human responses in social science, achieving "algorithmic fidelity" by closely emulating diverse demographic groups. While it underscores the innovative use of these models in extracting complex human patterns, it raises significant ethical concerns. How do we safeguard against the misuse of such models, especially given their potential to perpetuate or amplify biases inherent in their training data? Moreover, the concept of "algorithmic fidelity" is highlighted as crucial for ensuring that the outputs of these models accurately reflect diverse human responses. Given the demonstrated biases in data, how can researchers ensure that the use of language models does not reinforce these biases in social science research?

HongzhangXie commented 2 months ago

In the article "Simulating Social Media Using Large Language Models to Evaluate Alternative News Feed Algorithms," the authors simulate people's reactions to information echo chambers, all information, and 'bridge' information using Large Language Models (LLMs) and Agent-Based Modeling. I am very interested in the strategy of using large language models to simulate real people for research purposes—using LLMs can significantly reduce the time and cost of experiments.

However, I am concerned about whether LLMs can appropriately simulate real human reactions. This is especially pertinent in the current study which aims to detect people's reactions to toxic language when faced with viewpoints different from their own. As GPT-3.5, a commercialized LLM, may have intentionally reduced probabilities of generating toxic language, this could introduce biases. Additionally, the bridge effect might be less significant in real populations than in LLMs.

Lastly, from a corporate perspective, how do we demonstrate that using a bridge algorithm to break users out of information echo chambers can yield a higher output for the company than maintaining these echo chambers?

Xtzj2333 commented 2 months ago

“Generative agents: Interactive simulacra of human behavior.”

The researchers designed a three-component architecture that enables LLM agents to engage in memory, reflection and planning. Agents with this architecture could mimic real-life human behaviors. I wonder if we could borrow psychologists' domain knowledge and add more components to this architecture to make it even more realistic? For example, there are many psychology models of human behavior. Perhaps we can use components of these psychology models to create an even more realistic simulacra.

maddiehealy commented 2 months ago

Here is my 300-400 word summary: After completing the week 5 orientation readings on social simulation, I was drawn to explore more about how synthetic data is used in training computer vision models, particularly in the context of the challenges it presents. My curiosity led me to select the research paper Exploring the Sim2Real Gap Using Digital Twins for this week's reading, which tackles the issue of the performance discrepancies—known as the Sim2Real Gap—between models trained on synthetic data and their application to real-world data.

The research underscores synthetic data's role as an alternative to traditional datasets for training complex computer vision models. The benefits of synthetic data include the ability to rapidly generate diverse data variations with minimal effort and free or inexpensive annotations. However, a major drawback is that these models often underperform in real-world tests, bringing us to the persistent Sim2Real gap.

The authors assess the quality of 3D models and their attributes, resulting in two datasets: one with real images of YCB objects and another with their synthetic counterparts. These datasets explore how 3D model quality variations—such as noise levels, geometry holes, texture blurs, and lighting artifacts—impact tasks like object detection and instance segmentation. Both datasets are available for download, offering a practical resource for further exploration.

A significant finding is the identification of specific defects that severely degrade model performance. Notably, high levels of noise and significant texture blurs drastically reduce model accuracy, whereas changes in ambient lighting are less impactful.

This research not only provides two valuable datasets but also identifies key factors contributing to the Sim2Real gap, an area of concern that was previously not understood. By pinpointing areas needing improvement, the study aims to refine data creation processes, boost synthetic dataset effectiveness, and narrow the Sim2Real gap, specifically in computer vision developments.

Question: (Unsure if we are still required to include questions along with our summaries post-Week 5 discussion posts, so just going to include one this week just to be safe) Moving forward, I wonder what the role of human intervention and/or oversight should be in the iterative process of refining synthetic datasets?

guanhongliu2000 commented 2 months ago

I would recommend the article "Assessing Bias in LLM-Generated Synthetic Datasets: The Case of German Voter Behavior" by Leah von der Heyde, Anna-Carolina Haensch, and Alexander Wenz, published in December 2023. It evaluates the challenges and implications of using large language models (LLMs), specifically GPT-3, to create synthetic datasets for studying voter behavior. The authors focus on the 2017 German federal elections, employing GPT-3 to generate synthetic voter personas based on the German Longitudinal Election Study. The study reveals that while GPT-3 can effectively mimic some voting patterns, particularly among partisan voters, it struggles with predicting non-mainstream and complex voter profiles, often due to the biases inherent in its training data. This limitation raises significant concerns about privacy, accuracy, and the ethical use of synthetic data in social sciences.

The paper highlights GPT's tendency to rely on simplified cues like party affiliation, rather than nuanced socio-political factors, resulting in less accurate predictions for voters of smaller parties or those without clear partisan ties. The research underscores the need for caution when using LLMs for data synthesis, especially in sensitive areas like public opinion research, due to potential biases and discrepancies that may distort the representation of diverse political groups.

The article is a significant contribution to the field of transformers and social simulation because it critically examines the application of advanced AI technologies like LLMs in replicating and predicting human social behaviors, specifically in the political domain. It stands out due to its thorough methodology, the comparative analysis between synthetic predictions and actual voter data, and its focus on the implications of AI biases. These insights are crucial for researchers and practitioners in the field, highlighting both the potentials and limitations of using AI to simulate social phenomena, which is essential for ethical AI development and application in societal research.

mingxuan-he commented 2 months ago

In “Improving factuality and reasoning in language models through multiagent debate”, the most interesting section to me is when different LLMs debate on the same prompt. In the paper, the authors used the 2023 versions of ChatGPT and Bard, which were similar in performance. I wonder what's the outcome of this debate if we put two LLMs of different size/quality (e.g. a 7b model against Claude Opus). will the more advanced LLM "teach" the less advanced LLM to solve problems, or will they collectively generate worse results than the larger model alone?

kceeyang commented 2 months ago

300-400 word reflection:

After reading week 5’s orientation readings, like what I wrote in the orienting post, I started to look into the limitation of the Reinforcement Learning with Human Feedback (RLHF). RLHF uses feedback from users to finetune outputs, and many large language models, like ChatGPT, would use RLHF to align their outputs to human values. However, in this process, the trained reward model may misrepresent diverse human preferences due to the inconsistency or incorrectness in the collected human feedback. In the paper that I found, “MaxMin-RLHF: Towards Equitable Alignment of Large Language Models with Diverse Human Preferences,” the authors first addressed the alignment problem with the single reward RLHF. They proved that it is not possible to universally align language models using single reward RLHF with diverse human preferences mathematically due to its inability to explain the inherent diversity among human subgroups.

They then provided a potential solution on alignment with diverse preferences using a mixture preference distribution learning approach, which can capture a wider range of complex, real-world preference distribution. This MaxMin RLHF would use an expectation-maximization algorithm to align with both the majority and minority preferences so that it would better present the diversity of human preferences. They viewed this approach as an egalitarian strategy, which they were inspired from the Egalitarian principle in social choice and operations research, in wanting to maximize the social utility objective for alignment.

So I think their proposed MaxMin RLHF could be very useful in many social science analyses as it would efficiently align different groups’ preferences and lessen the social disparity. I believe this method would be great to use with an unbalanced social dataset that contains population groups with diverse demographic markers (e.g., race, ethnicity, age, genders, etc). In this case, you would probably have a majority and minority user groups. The MaxMin RLHF method maybe can help to maintain a balanced alignment with both groups in generating “socially fairer outcomes.”

beilrz commented 2 months ago

I think “Generative agents: Interactive simulacra of human behavior.” is a very innovative paper. I was wondering, if we can use LLM to simulate human behavior, would it be possible we deploy such agent online to track the group's behavior and predict future action of the group? considering a subreddit, we could tell our LLM agent to read the posts, and forage relevant information online, and the goal would be to minimize the difference between the future content of the sub and our foraged content, in order simulate the information foraging behavior and mindset of the subreddit users.

uc-diamon commented 2 months ago

In “Improving factuality and reasoning in language models through multiagent debate”, I understand hallucinating for models is not ideal in the sense it provides false information. However, does this multi-agent debate diminish "creativity" of the model if it is prompted, for example, to create a story about an enchanted forest?

anzhichen1999 commented 2 months ago

What mechanisms can be designed to ensure ethical interactions between generative agents and humans, especially when these agents are capable of forming dynamic social relationships and influencing human behavior？

HamsterradYC commented 2 months ago

“Jury learning: Integrating dissenting voices into machine learning models”

While the manuscript proposes a system to integrate diverse views, the initial selection and weight given to different jurors might still be biased based on the subjective choices of the practitioners. How does the model prevent bias in the jury selection process itself, especially biases introduced by the machine learning practitioners' subjective decisions in jury formation?

CYL24 commented 2 months ago

I would recommend the article “How Accurate are GPT-3’s Hypotheses About Social Science Phenomena?” by Hannes Rosenbusch, Claire E. Stevenson, and Han L. J. van der Maas, published on July 03 2023. Currently, many researchers face difficulties in comprehending the extensive array of interconnected social science concepts within their discipline. Formulating hypotheses regarding these numerous interrelationships is even more time-consuming. To address that situation, the article presents a study testing GPT-3’s ability to predict simple study outcomes in social science, which shed lights on both the probability and limitations of using LLMs to automate the generation of hypotheses about the empirical correlations for inspiring and supporting original research in social science.

The article examines the GPT-3’s ability to predict simple study outcomes based on political attitudes collected by surveying from 600 US citizens. The method involves prompting GPT-3 to predict the direction of empirical inter-attitude correlations, with various prompting strategies such as zero-shot, five-shot and chained prompting, and extensive fine-tuning. In the article, the authors provide very detailed steps in how they performed the study and compared the results of these three promptings.

One of the most striking findings is the impressive accuracy in predicting, ranging from 78% in zero-shot prompting to 97% with extensive fine-tuning. The results suggest that AI has the capacity to contribute meaningfully to hypothesis generation, especially for empirical scientists’ researches, and to enhancing research efficiency by providing a “second opinion” before conducting studies and even serving as a reference for power analyses.

However, the article also underscores several limitations. For instance, LLMs lacks the nuanced understanding, ethical awareness, and contextual comprehension the human researchers possess, which could lead to biased or discriminatory hypotheses rooting in the training data. In addition, the study's reliance on written text data as the sole source of knowledge also limits its understanding compared to human researchers.

I found this article might be inspiring if one is interested in conducting a research on empirical correlations and even adding more diverse data other than written text to the study.

MarkValadez commented 2 months ago

I read "Topological to deep learning era for identifying influencers in online social networks :a systematic review." The article compares structural methods against various NN methods for identifying influencers in different online networks. It was found that DL approaches outperformed traditional structural and ML approaches for understanding the social importance of behaviors or social position. What this makes me think of is the idea of social simulation in the orienting readings since DL is not only able to simulate social circumstances but evaluate the importance of social behaviors in a constrained setting. I wonder if it is possible to look at the distribution of self attention matrices for any given actor to understand the differences in language which make influencers persuasive. This comes to mind because the structural layout is a result of the communication behaviors as they acquire popularity rather than the structure being itself the cause of their prevalence and importance.

kangyic commented 2 months ago

Generative Agents: Interactive Simulacra of Human Behavior

Did the model account for gender difference, personal interests, first sight impression...? because when we talk about reflections, every and each one of those traits are different across individuals and affecting what we're reflecting everyday. Also we probably want to add randomness to their actions because some people are just acting on impulse.

XueweiLi1027 commented 2 months ago

“Jury learning: Integrating dissenting voices into machine learning models” After reading this amazing application of machine learning model, I was wondering how reliable is machine learning model generated value judgements (in this case, jury decisions) in real world scenarios (say a mock trial)? What are the concerns for such a model to be actually put to use?

erikaz1 commented 2 months ago

Du et al. (2023) share that multi-agent debate is computation and resource intensive. In particular, models struggle to store and process long debate histories due to limited context windows. Is there a way to select and summarize aspects of models’ chat history most relevant to the current topic, rather than generically summarizing ideas from early in the debate?

hantaoxiao commented 2 months ago

The intricacy of maintaining long-term coherence in an agent's behavior, especially as scenarios and interactions become increasingly complex, highlights a crucial challenge. This calls for sophisticated memory and planning architectures that go beyond current capabilities, perhaps by integrating more advanced models of memory and prediction. What are the primary challenges in ensuring that generative agents continue to behave believably over extended interactions within a dynamic environment?

00ikaros commented 1 month ago

What are generative agents, and how do they simulate believable human behavior for various interactive applications? Specifically, how does the architecture of generative agents extend a large language model to store and synthesize experiences, enabling dynamic behavior planning? Additionally, in what ways do these agents produce believable individual and social behaviors within interactive environments, and what critical roles do the components of observation, planning, and reflection play in enhancing the believability of these simulations?

icarlous commented 1 month ago

The paper “Generative Agents: Interactive Simulacra of Human Behavior” presents a groundbreaking concept. If LLMs can replicate human behavior, is it feasible to deploy such agents online to monitor and predict group behavior? For instance, in a subreddit, we could program an LLM agent to read posts and gather relevant online information, with the goal of minimizing the difference between future subreddit content and the gathered data. This would emulate the information foraging behavior and mindset of subreddit users.

Carolineyx commented 1 month ago

I want recommend "[A Generative Pretrained Transformer (GPT)–Powered Chatbot as a Simulated Patient to Practice History Taking: Prospective, Mixed Methods Study]". Summary:

The article investigates the use of GPT-3.5 as a chatbot for medical students to practice history taking. The study aims to address the limitations of traditional patient interaction methods, such as psychological stress and limited repetition. The authors developed a chatbot interface using GPT-3.5, optimized with an illness script and behavioral instructions. Medical students engaged with the chatbot, and their interactions were recorded and analyzed. The study found that GPT-3.5 could provide medically plausible answers 97.9% of the time, offering a positive user experience with an overall Chatbot Usability Questionnaire (CUQ) score of 77 out of 100. The chatbot's responses were mostly accurate when based on the script, though some responses included fictitious or socially desirable information.

Extending Social Science Analysis:

The methods described in the article can extend social science analysis by using GPT-powered chatbots to simulate various social interactions and scenarios. For instance, in sociology and psychology, researchers can utilize GPT to create virtual interactions that model social behavior under different conditions. This can be particularly useful for studying phenomena such as group dynamics, social influence, and communication patterns. By leveraging GPT's capabilities, researchers can generate extensive data on how individuals might respond to various social stimuli, thereby enhancing our understanding of social behaviors and their underlying mechanisms.

Pilot Use of Social Data:

To pilot the use of GPT-powered chatbots in extending social science analysis, I would propose a study focusing on the impact of social media interactions on mental health. The social data required would include:

User Demographics: Age, gender, location, and occupation of users. Interaction Data: Posts, comments, likes, shares, and direct messages. Content Data: Textual content of posts, including hashtags, keywords, and sentiment analysis. Behavioral Data: Time spent on social media, frequency of interactions, and engagement metrics. Mental Health Indicators: Self-reported mental health status, frequency of mental health-related posts, and participation in mental health forums or groups.

By inputting this data into a GPT-powered chatbot, we can simulate social media environments and observe how different types of interactions affect users' mental health over time. These simulations could help identify specific patterns of social media use that are associated with positive or negative mental health outcomes. Researchers can then use these insights to develop interventions aimed at promoting healthier social media habits and mitigating the adverse effects of negative interactions.

Brian-W00 commented 1 month ago

The article "Generative Agents: Interactive Simulation of Human Behavior" introduces advanced computational software agents—generative agents—that can simulate daily human behavior, artistic creation, and social interaction. Please ask, when designing these generative agents, how to ensure their moral and ethical behavioral boundaries when simulating complex social behaviors?

erikaz1 commented 4 weeks ago

New possibility reading: “Trustworthy Graph Neural Networks: Aspects, Methods, and Trends” (Zhang et al. 2024)

Adversarial attacks in machine learning aim to mislead models and are hard to detect. Deep neural networks (DNNs) are particularly vulnerable to minor adversarial changes. In node classification tasks with graph data, attacks manipulate these nodes to misclassify targets, resulting in potentially producing real-world consequences like spreading misinformation or manipulating online reviews. Categorized into “inference” and “training” phase attacks, attacks aim to make models behave predictably or degrade overall performance. Despite a lack of notable real-life GNN attack examples, the rise of GNN adoption in critical fields (medicine, finance, anywhere using recommendation systems) emphasizes the need for robust defenses.

Developing robust and trustworthy GNNs involves sustaining model accuracy under adversarial perturbations. Attackers exploit perturbations by modifying graph structure, node attributes, or injecting nodes to misclassify targets or degrade overall performance. In applications like malware detection and financial analysis, robustness is critical. For example, in malware detection, network traffic graphs can be manipulated to evade detection, while in financial analysis, attacks can lead to incorrect predictions.

This article, “Trustworthy Graph Neural Networks: Aspects, Methods, and Trends” (Zhang et al. 2024), on GNN robustness and accountability enhances social science analysis by helping us understand what adversarial attacks look like, where they might occur, their consequences, and how to defend against various types of attacks. Zhang et al. share the steps that researchers can take to prevent attacks and enhance the explainability and robustness of their GNNs. For instance, they can develop and implement robust defense mechanisms against adversarial attacks by investigating techniques such as adversarial training, graph data sanitization, and anomaly detection to identify and mitigate potential threats. Additionally, researchers can focus on improving the interpretability of GNN models by designing explainable architectures, developing post-hoc explanation methods tailored to graph data, and integrating visualization techniques to enhance model transparency. Lastly, interdisciplinary research efforts involving collaboration between experts in machine learning, network science, and the social sciences can lead to an enhanced understanding of the socio-technical implications of GNNs and designing more robust systems.