6. Large Language Models (LLMs) to Predict and Simulate Language - orienting

lkcao commented 11 months ago

Post questions here for this week's oritenting readings: P. Vicinanza, A. Goldberg, S. Srivastava. 2022. “Quantifying Vision through Language Demonstrates that Visionary Ideas Come From the Periphery”. PNAS Nexus.

XiaotongCui commented 10 months ago

This research is fascinating! However, I have a question regarding their approach of predicting prescience at the sentence level. I believe that an article should be considered as a cohesive whole, naturally warranting a document-wide analysis for studying prescience. I'm curious about the choice of a sentence-level approach – is it driven by computational efficiency considerations, or are there specific advantages to adopting this approach over a document-wide analysis?

ana-yurt commented 10 months ago

I have two questions on this research paper: 1) For this study, what is the specific advantage of using BERT over word2vec? Does BERT capture contextualized meanings (hence context-sensitive novelty) much better than word2vec? 2) The ability to quantify the repercussion of vision (as novelty retained in time) offers fascinating prospects for historically minded research. I wonder if we could use such methodology to confirm or contradict the influential theses in social sciences that similarly dealt with the 'origins' of systems of thought—such as Weber's "spirit of capitalism," Thompson's "time and work-discipline," or the various strands of historiography on modern nationalism, to name a few. What difficulties might we encounter in attempting to do so?

bucketteOfIvy commented 10 months ago

This is a really fun study, and I have two quick questions about the method.

The first is just a clarification on the math. In short, the authors define Contextual Novelty as follows: $$CN(s) = \left(\prod\limits_{i=0}^NPP_i\right)^{\frac{1}{N}}$$ To my understanding, $N$ in this context is the number of all words in the entire corpus, which means that the above is the product of the perplexity $PP_i$ of every word in the corpus normalized by the size of the corpus, which feels.. weird. In the previous sentence, however, they mention that $CN(s)$ has an $nth$ root taken to control for sentence length, which makes it feel like the above is meant to be the product of the perplexity of all $n$ words in the sentence taken to the $\frac{1}{n}$ power, which feels much more reasonable.

Is that the case (i.e. is the above equation a typo?) or is the use of $N$ correct in this context? If the latter, how should we think about this equation?

The second is just a very quick question about BERT that might get answered in the fundamentals readings. In short, they seem to highlight BERT as useful as it's trained on what seems to be the CBOW task. While that's a theoretically justified choice in this context, does that make the model "worse" at predicting words than if it were optimizing the skipgram task (maybe even making it a "more generous" measure of prescience)?

sborislo commented 10 months ago

I love the approach, and it seemed quite thoughtful. That being said, are the analyses not biased to predict greater prescience from peripheral sources due to the corpora used? Even with the clever use of transformations (to account for text size), I have to imagine commonplace writings/speech are less likely to be summarized in roughly the same way (I believe the House and Senate do this to some extent, and firms definitely do it in their Q&A). This would cause the model to be more surprised by the text of peripheral sources purely due to their text not being summarized as often.

The validations (which I found compelling) assuage my concerns a good amount, but the findings regarding politicians' success as a result of being more prescient violate my prior notion quite a bit. While I imagine the most successful politicians are more prescient, I also have to imagine the least successful politicians (and to a greater extent) are also more prescient on average. In many domains, it feels like "going with the flow" is the key to success, albeit moderate success. "Going against the grain" seems like a risky way to win big or, more likely, lose out on moderate success.

yuzhouw313 commented 10 months ago

The concept of utilizing linguistic markers, identified as novel within specific contexts and timeframes, to capture their contextual novelty is fascinating to me. The method employed by Vicinanza et al., which validates the novelty and subsequent recognition of ideas through anecdotes and case studies across different periods, provides a compelling framework for understanding how prescient ideas emerge and gain acclaim.,

So my question is: extending beyond solely identifying and validating the novelty and success of prescient ideas through historical data, could we leverage this concept towards predictive analysis? Specifically, is there potential to identify prescient ideas within the most current documents, thereby informing and guiding future research based on these insights? Considering we can still calculate perplexity scores of unconventional tokens or phrases, I would imagine this prediction task is feasible in terms of the probabilistic framework.

cty20010831 commented 10 months ago

This is a very innovative approach to examine prescient ideas using BERT! I personally think the advantages of using BERT (compared to previous methods, e.g., n-gram) is sensible and I also believe the formula backing the operationalization of "prescient ideas" is convincing.

However, I do have a clarification question regarding the measurement of periphery. Specifically, despite knowing the specific measure of calculating periphery, what is the data used to calculate? Is it based on the corpus or other types of data?

alejandrosarria0296 commented 10 months ago

Based on the framework and the findings section, the paper seems to do an interesting work an detecting the meaning spaces where potentially prescient ideas are created. I'm still unclear on using perplexity and decreases in contextual novelty as operationalizations for prescience. Why would these measures indicate there prescience of a word in a specific time period?

YucanLei commented 10 months ago

The study seems pretty interesting, however, is it not biased? Particularly data biases. The analysis relies heavily on textual data from transcripts, legal cases, and stock returns. It is essential to consider potential biases in the data, such as selection bias in the choice of cases or transcripts, which could impact the results and conclusions drawn.

ddlxdd commented 10 months ago

I believe BERT represents a groundbreaking tool for contextual analysis. However, I'm curious about the appropriate application of the BERT model. It's noted that researchers commonly utilize pre-trained BERT models (such as those from Google) and then fine-tune them for their unique datasets. I'm interested in understanding how this fine-tuning process operates. Additionally, in a research setting, how can we demonstrate that this fine-tuning is effective and that the pre-trained model is suitable for our specific requirements?

Vindmn1234 commented 10 months ago

For the methodology part, the author employed perplexity as a key measure to evaluate the contextual novelty and, subsequently, the prescience of ideas within a corpus. Perplexity is commonly associated with the predictability of a sequence of words, with higher values indicating less predictability and potential novelty. However, in real-world data, especially across varied domains like politics, law, and business, many factors can contribute to high perplexity scores, including but not limited to actual novel idea expression. How did you ensure that the perplexity scores were reflective of genuine prescient ideas rather than other forms of linguistic anomalies or noise?

muhua-h commented 10 months ago

If prescient ideas more often come from the periphery, what does this imply about the structure and dynamics of organizations and fields? Should there be a conscious effort to foster and pay attention to peripheral voices within organizations, and how might this shift impact the balance between maintaining core values and encouraging innovation?

yunfeiavawang commented 10 months ago

This research adopted a deep learning model to detect the prescient ideas in areas including politics, law, and business, which is very interesting. I was wondering if the fine-tuned BERT model could be generalized to the area of arts, which requires more innovative thoughts. If so, how can we define the center and periphery of film/music/literature, and is there a possibility to combine multimodal content analysis into the analytic procedure?

yueqil2 commented 9 months ago

I am so impressed by the intuition and measurement here, which is pretty clear and make sense to me. In the paper, the authors mention that they "split the corpus into time periods and fine-tune separate BERT models one each split of corpus" to "examine how the contextual novelty of a sentence changes over time". So what's the difference between this approach and fine-tuning BERT to entire corpus? I also wanna know more about how to partition the data without tuning hyperparameters (so attractive if I don't need to pre-processing texts.)

Audacity88 commented 9 months ago

A fascinating reading. My question is, is prescience conserved? By this I mean, the authors seem to assume that a prescient statement becomes more and more prescient as time goes on. But I wonder if there could be cases where a trend reverses itself so that something that seemed prescient on a 10-year timescale becomes anti-prescient on a 50 year scale?

QIXIN-LIN commented 9 months ago

How can the principles of politics, law, and business be applied across different fields? It seems challenging to determine whether their influences are central or peripheral in various areas - some might be harder and some might be easier... any way to figure out?

runlinw0525 commented 9 months ago

Given that innovative ideas often originate from the fringes of established industries, how should technology, finance, healthcare, and entertainment companies adjust their strategies to better recognize and nurture such concepts from smaller, less traditional sources? It would be worthwhile to assess these companies' responses through case studies and industry reports.

naivetoad commented 9 months ago

How does the model's definition of "periphery" vary across different domains, and what implications does this have for identifying underrecognized sources of innovation?

ethanjkoz commented 9 months ago

As a follow up to Dan's question regarding the time-status of prescient ideas, as I understand it, prescience is always adapting within a field. However what I am also questioning is the time period at which prescient ideas take to become mainstream? Adopters of novel, prescient ideas popularize new techniques and thereby creating a paradigm shift with specific domains; how do different domains compare when it comes to the time it takes to adopt new ideas from the periphery?

donatellafelice commented 9 months ago

this is so interesting! I was thinking about how to associate this back to conversation, as my area of interest. i have found that people often, in many ways, do not know what they are saying (or at least do not self evaluate what they have said very well). we do not seem to fully grasp where our ideas of how we are phrasing things come from. unfortunately, we do not have historic conversation data: where we have books, scripts, or newspapers, we rarely find direct transcriptions, and if we do only in very formal settings like court houses. we dont really really know how people talked day to day.

so, contextual novelty is very exciting, as it gives me hope to analyze what has been said in a more nuanced way. i wonder then, could this be used also to trace the influence of something as complex as a piece of political protest art? it was always explained to me in my previous study of the humanities that causal claims or even connections could not be made regarding the impact of a piece of art work.

joylin0209 commented 9 months ago

The paper mentions that prescient ideas often emerge from the periphery of their respective fields. I'm curious about the criteria used to define 'the periphery' in this context and how researchers can identify individuals or ideas that fall into this category

Caojie2001 commented 9 months ago

The strategy and methods utilized in this research are really inspiring. Also, I find it interesting that both the most and least prescient senators are mainly Democrats. However, as with many other context analysis studies, I wonder how the criteria used to evaluate the soundness of the research methodology came about. For example, why did the researchers decide to use the prescience of 'Gingrich senators' to validate the study strategy?

anzhichen1999 commented 9 months ago

How can the identification of linguistically prescient ideas through deep learning models influence the evolution of policy-making and legal interpretations? Can this approach be used to predict and understand shifts in political ideologies or legal doctrines before they become mainstream?

volt-1 commented 9 months ago

The research relies on diverse corpora to quantify visionary ideas. However, is there a risk that the analyses might be biased towards predicting greater prescience from peripheral sources due to the nature of the corpora used?

icarlous commented 9 months ago

I am interested in how this method can be applied at field’s level, just like detecting threads of innovation or emergence of new field.

chenyt16 commented 9 months ago

The paper introduces a new deep-learning model that examines how language is used across fields such as politics, law, and business. Are these models trained separately, or do they share some parameters? If they are trained separately, could sharing some parameters possibly improve performance?

Marugannwg commented 9 months ago

That's an intuitive and mind-opening approach to defining/studying some concepts that are meaningfully innovative, i.e.,prescient. As I understand, this needs the corpora to have time stamps with it, such that we can know how to split them to represent the novelty of a concept during each dissection of time. We want to cherry-pick those concepts with high novelty but slowly reduce (i.e., slowly being accepted and used by the majority).

I wonder what size of corpora it would need so that we can effectively compare the novelty and find the top several percent meaningfully. Would it capture a drift in language use through time? (or are we just looking at those which drift faster than others?)

beilrz commented 9 months ago

In this research, authors build a framework for evaluating and detecting the novel ideas that latter become mainstream, so called prescient idea. I was wondering would it be possible to use this framework for prediction? for example, what novel idea today may likely become mainstream tomorrow. My intuition is to first to use the framework to acquire labeled data, and train a classifier on it.

Twilight233333 commented 9 months ago

When measuring the prescience of politicians, the authors use the re-election and status of politicians. Does this really reflect prescience? Can you identify prescient ideas that are not viewed favorably by current society?

Brian-W00 commented 9 months ago

How does the model account for the influence of external socio-political events on the prescience of ideas, and could incorporating such data enhance the model's accuracy in predicting the emergence of prescient ideas?

HamsterradYC commented 9 months ago

Steps to mitigate bias seem to focus more on the technical and methodological aspects of model training than on broader cultural and linguistic variability in this paper.How do cultural and linguistic differences across domains affect the model's ability to accurately quantify precognition, and what steps can be taken to mitigate any biases this may introduce?

JessicaCaishanghai commented 9 months ago

This paper is very interesting. Model training has more cultural implications. How might organizations or institutions encourage and harness innovative thinking from diverse data sources with like English hegemony over other languages? How can it be integrated with the traditional approach?

Dededon commented 9 months ago

I'm not quite agree with the hypotheses the authors raised. It needs better definition of the "core" and "periphery" regarding specific fields. I believe that the generation of political and legal concepts are more prone to be hierarchical, rather than decentralized before the emergence of mass media.

erikaz1 commented 9 months ago

How might we be able to take into account shorter sentences that make it difficult for BERT to make predictions due to lack of context (10)? There may be many contexts for which we’d like to apply a similar type of prescience analysis e.g. memes/popular trends, were we might want to analyze tweets (for instance). However, the text available under these circumstances may often not have 10 or more tokens.

floriatea commented 9 months ago

How does the model differentiate between ideas that are truly prescient and those that merely become popular or widely accepted due to external factors such as societal changes, technological advancements, or shifts in policy, rather than their intrinsic value or foresight?

Carolineyx commented 8 months ago

how this phenomenon of periphery-originating prescient ideas could impact strategies for innovation management in organizations? Specifically, how might organizations adjust their search for innovative ideas or foster an environment that encourages contributions from individuals or groups traditionally considered peripheral?

UChicago-Computational-Content-Analysis / Readings-Responses-2024-Winter

6. Large Language Models (LLMs) to Predict and Simulate Language - orienting #27