Week 5. Apr. 19: Transformers and Social Simulation - Orienting

JunsolKim commented 4 months ago

Post your questions here about: “Language Learning with Large Language Models” and “Digital Doubles for Simulation, Prediction and Insight” in Thinking with Deep Learning, chapters 11 and 18; “Social Simulacra: Creating Populated Prototypes for Social Computing Systems.” UIST. Park, Joon Sung, Lindsay Popowski, Carrie Cai, Meredith Ringel Morris, Percy Liang, Michael S. Bernstein. 2022.

Marugannwg commented 3 months ago

I'm deeply resonating with the discussion of LLM limitations. Short summary from my perspective here:

Positivity Bias: LLMs often omit negative content due to safety or reputation concerns --> unrealistic (and boring!) when simulating certain interactions;
Uniformity Bias: LLM responses tend to be more uniform and less varied than human responses (which is predictable and boring! due to their design, which averages language usage across vast datasets)
Atemporality Bias & Monolingual Bias: Limited ability to analyze historical and cultural changes and inference about language differences.
Senseless Bias: LLMs, based purely on text, lack direct sensory perceptions, leading to discrepancies between language associations and real-world perceptions. Inference @ Scale: Can be very, very costly dependent on your task!

I wonder if those limitations can be resolved (naturally) as larger models, following currents mainstream transformer architecture, come out. And if we want to utilize existing model and API, what are some strategies (besides extensive retraining and tuning) that can handle those issues?

maddiehealy commented 3 months ago

After reading Social Simulacra, my (really broad!) question veers more on the philosophical side of DL, but to what extent is human behavior predictable? Reading about social simulacra prototyping reminded me of a quote from Adolphe Quetelet’s study of social phenomena that notes how “every suicide may be a product of individual choice, but all suicides in a year fall into knowable patterns.” I think this quote illustrates some ideas of social simulacra, especially getting into trackability of free will and more philosophical thinking in the computational social sciences.

I was also intrigued by the naming of this prototype as “social simulacra.” Is it meant to invoke Jean Baudrillard’s ideas simulacra? Maybe my introduction to what simulacra means is limited and/or skewed through an academic lens, but I have mainly encountered the concept of simulacra through Jean Baudrillard’s work on Simulacra and Simulation so far.

With Baudrillard’s work, I got a warning tone about simulacrum, specifically with quotes like “we live in a world where there is more and more information, and less and less meaning." But also, the stages of simulacra development end with the simulacrum having no relationship to any reality whatsoever. I've attached a screenshot summarizing the stages of simulacra development for reference. Did the authors of this paper intend to invoke this connotation when naming this social simulation prototype? Will social simulacra prototyping eventually follow the steps listed below?

Pei0504 commented 3 months ago

The chapter 11 effectively outlines the advancements and applications of large language models like GPT, emphasizing their role in diverse fields from policy to business. It touches upon their development from basic models to complex systems capable of detailed simulations and interactions. However, it lacks a thorough discussion on ethical concerns and the potential biases in training data. How can developers ensure that the training of large language models like GPT incorporates ethical considerations and effectively mitigates biases found in the training data? Future explorations could benefit from a deeper focus on these ethical implications and the technological limitations to better understand and enhance the reliability of LLMs.

HongzhangXie commented 3 months ago

The GPT model employs the Transformer architecture, relying on self-attention mechanisms to process sequential data. By learning to predict the next word in a sentence, the model grasps the general patterns and structures of language, allowing it to consider the entire text sequence when generating each word, thereby enhancing its understanding of context.

GPT generates text based on probabilities. However, probability may not be the sole criterion for evaluating the quality of a model. For instance, the content produced by the model might be incorrect or unethical. How should we ensure that the model's generated content conforms to factual accuracy or ethical standards? For example, how can we make the machine recognize and refrain from generating a statement like "Benjamin Franklin was a President of the United States" that contradicts historical fact?

I am also particularly interested in the method of Reinforcement Learning with Human Feedback (RLHF). I would like to understand more about how it works. I have a concern that different groups may provide feedback with varying frequencies, and as feedback accumulates, the model might increasingly favor the preferences of those groups that provide feedback more actively, potentially neglecting the perspectives of less vocal groups.

Xtzj2333 commented 3 months ago

As a language model, why is GPT 4 so good at coding? Why is it so bad at maths? Is this due to its transformer architecture, the data that it trains on, or something else?

guanhongliu2000 commented 3 months ago

In Social Simulacra, the term "Social Simulacra" resonates with Jean Baudrillard’s concept of simulacra and simulation. How might the use of AI to simulate human social interactions relate to Baudrillard's stages of simulacra, especially concerning the blurring of boundaries between simulation and reality?

kceeyang commented 3 months ago

I am very interested in the Reinforcement Learning with Human Feedback (RLHF), which was introduced in Chapter 11. A multiple-model training process seems to be involved while optimizing a language model. A language model needs to be pretrained first, and a reward model trained with human feedback using methods from reinforcement learning would then be used to retune the language models. I am wondering, how people would ensure the quality and impartiality of the collected human feedback so that they would have a less biased model?

mingxuan-he commented 3 months ago

In the "simulating political actors" subsection, I think there's interesting caveat with using propriatary LLMs for political simulation. Recently many big news media like New York Times, CNN, Disney, etc. has banned openai's data scraper, therefore excluding themselves from the training data. How would that impact the quality of political simulation from openai's models? None of the popular LLM benchmarks seem to measure the quality of political simulation, so is it possible that some older models like GPT-3 actually perform better/less biased at these tasks?

beilrz commented 3 months ago

One question I have for the design of LLM is that what incentive LLM have to model unlikely but logically true statement. Considering the following objective function, LLM should always return the mostly likely next token to reap the max gain in the objective function (for example, next token for "I love" would mostly likely be "you" rather than "myself"). This could be an issue if we want to construct a representative dataset from LLM output, would this issue mostly be addressed by using a higher temperature?

uc-diamon commented 3 months ago

In regards to chapter 12 Programming Foundation Models with Language, how does a zero-shot model "learn" without back propagation and minimizing a loss function?

anzhichen1999 commented 3 months ago

What are the potential long-term effects on less-represented languages? (I think Xuechunzi Bai also touched on this topic)

HamsterradYC commented 3 months ago

What are the ethical considerations and potential societal impacts of using LLMs to simulate public opinion or forecast political behaviors? How might these simulations influence public policy or societal trust if made public?

CYL24 commented 3 months ago

Chapter 11 mentions that sequential training of LLMs on ordered time periods could enhance their ability to simulate historical socio-cultural changes. In that case, do researchers take scalability and generalizability of these simulated social interactions across different historical contexts and cultural settings into account? If yes, how might researchers balance the need for accuracy from the perspective of history with the scalability required for large-scale simulations?

Brian-W00 commented 3 months ago

Can LLMs have logical reasoning ability on the data not trained before?

MarkValadez commented 3 months ago

It seems to me there is a limitation to the self-attention scheme which still refrains from identifying a notion of meaning separate from the interpretation of language. I wonder what are the limitations if there is such a thing as the essential representation of Forms from Plato or something as the collective representations by Durkheim. Moreover, there is a notion of a personal image or idilic projection of object and meanings in our concouss self, separate from Forms or collective representations. While machines can emulate behavior, how does this difference inform our behavior as separate and distinguishable from an LLM?

kangyic commented 3 months ago

What is the difference between self-attention mechanism and the attention layer in Bert? Why is adding bidirectional mechanism preventing model from performing well in text generation tasks?

XueweiLi1027 commented 2 months ago

Both readings convey a similar idea - LLMs and the social simulacra technique produce outputs based on a large amount of probability calculations. In every step, the 'most likely to happen' result becomes the results of the output. Yes, this mechanism probably can give us the 'correct' thing we ask for in most situations. But is this how humans generate insights (or develop intelligence, if you will), by going with the safest therefore most likely to be correct, choice? I feel like humans generate insights by often going with the opposite choice - things that are considered less likely to happen/ more risky. Is there a reason why LLMs developers claim to be creating artificial intelligence when the mechanisms of these models seem to operate on the other way?

erikaz1 commented 2 months ago

Chapter 11 talks briefly about Atemporality bias when training LLMs. Hypothetically, for sequentially trained LLMs that would enable “causal cultural analysis”, how might we produce an interpretation that balances changes in the cultural world, and nodes that did not change despite the historical events (the Cultural World on the right in Figure 12 seems to have changed a lot from the one on the left, but if we look closer, a sizable portion of the nodes may not have shifted at all)?

hantaoxiao commented 2 months ago

While LLMs are increasingly capable of mimicking human-like responses, their ability to understand and process underlying complexities in real-world scenarios still poses a challenge. There might be a risk of oversimplifying nuanced issues or perpetuating biases present in the training data. How effective are current language models at simulating complex social and political scenarios accurately enough to serve as predictive tools?

00ikaros commented 2 months ago

What are the key aspects and capabilities of large language models (LLMs) like the GPT family, and how have they contributed to the development of conversational agents and foundation models? Additionally, how can LLMs be controlled and managed through language for various use cases, from intelligent assistants to digital doubles, and what techniques are used for interacting with and programming these models, including zero-shot, few-shot, and reinforcement learning with human feedback (RLHF)? Lastly, what are the potential applications, limitations, and future developments of LLMs in fields such as social and natural sciences, policy, and business?

Carolineyx commented 2 months ago

for chapter 11: I'm curious about the impact of different pre-training objectives on the performance of large language models. How do objectives like masked language modeling (MLM) and autoregressive language modeling (ALM) compare in terms of their effectiveness in downstream tasks such as text classification, machine translation, and question answering? What are the theoretical underpinnings that explain the differences in their performance?

for social simulacra: What are the validation methods used to ensure the realism and reliability of generated social behaviors? How do current evaluation techniques measure the authenticity and plausibility of interactions produced by social simulacra compared to actual user interactions? What are the main challenges in developing robust evaluation metrics for these generative models?

for forecasting COVID_19 polarization study: What could be the limitations of LLMs in accurately capturing the dynamics of social polarization. What are the key factors that contribute to the challenges LLMs face in modeling complex social phenomena like polarization? How can these limitations be mitigated through advancements in model architecture or training data?

La5zY commented 2 months ago

Considering the rapid evolution of these models, what are the potential ethical implications and challenges associated with programming LLMs to act as digital doubles of human agents?

icarlous commented 1 month ago

It seems self-attention cannot identify meaning separate from language interpretation. Considering Plato’s Forms and Durkheim’s collective representations, what are the limitations? Additionally, how do personal images or ideal projections in our consciousness differ from these concepts? While machines emulate behavior, how does this distinction affect our behavior compared to an LLM?

Thinking-with-Deep-Learning-Spring-2024 / Readings-Responses

Week 5. Apr. 19: Transformers and Social Simulation - Orienting #9