Computational-Content-Analysis-2020 / Readings-Responses-Spring

Repository for organizing orienting, exemplary and fundament readings, and posting responses.
0 stars 0 forks source link

Discovering higher-level Patterns (E3) - DeDeo et al 2018 #25

Open HyunkuKwon opened 4 years ago

HyunkuKwon commented 4 years ago

Post questions about the following exemplary reading here:

Barron, Alexander TJ, Jenny Huang, Rebecca L. Spang, and Simon DeDeo. 2018. “Individuals, institutions, and innovation in the debates of the French Revolution.” Proceedings of the National Academy of Sciences 115(18): 4607-4612

wanitchayap commented 4 years ago

This might be a dumb question, but why the measure of resonance (R) is needed? Per the method section, the resonance is basically novelty (N) subtracted by transcience (T). In that case, wouldn't it be enough to use just novelty and transcience in the model? For example, table 1 can be 4 groups of low N low T, low N high T, high N low T, and high N high T (instead of the combination of N and R). Does this have to do with R being asymmetric?

timqzhang commented 4 years ago

It is a quite interesting paper relating content analysis to history and politics. Firstly I want to briefly reply the question by @wanitchayap, as I will do the presentation for this paper. Actually the focus of this paper is mainly on novelty and resonance, or resonance alone, but not transience, as resonance here is defined to show the influence of speeches. Therefore, personally I consider that the role of transience is to introduce resonance. Many conclusions are also made by resonance, as the innovation bias and the influence of speakers are both made based on the Resonance. It is also helpful to give one pattern (resonance here) instead of two (novelty and transience) to describe the influence of certain roles.

My concerns are mainly on the measurements and methods. For most of the conceptions in the paper, i.e. novelty, transience and resonance, they are quite abstract, so I think it may be necessary to do some comparisons of measurements, although the current one is already quite smart. Also, for the methods in the paper, more assessments could be done, as there are many other methods for topic modeling and corpus distance calculations.

nwrim commented 4 years ago

Probably due to super-restricted space on PNAS, I do not think authors gave a clear explanation about the logic behind applying topic modeling (LDA) and applying their novelty/resonance metric on the result from LDA. What could be the advantage in separating topics and applying metrics, rather than try to apply it on the entire corpus and see if there is an effect?

Also, I am not sure if KLD, or any divergence metric in this matter, is a good way to measure "novelty", since I don't think "different" necessarily means "novel". Could there be any other way to measure such rather abstract concepts?

Yilun0221 commented 4 years ago

I think the perspective of this paper is very inspiring. One thing I can not understand is about the exact metrixs of measurements. I am puzzled about using KL to measure surprise, especially about the generalization of the sample speech used here.

harryx113 commented 4 years ago

The choice of using K = 100 topics seem rather arbitrary. Is there any rationale behind it?

tianyueniu commented 4 years ago

Similar to @nwrim 's question, in my understanding, topic modeling is solely based on statistics/ words co-occurence, it doesn't actually evaluate the meaning of each word. In that case, is there a way to evaluate the accuracy of this paper's analysis on novelty based on KL divergence?

jsgenan commented 4 years ago

My question is similar to the above ones: "Novelty at the smallest time scales (a speech compared with the one just previous) is measured by the KLD". Why don't we measure novelty based on occurence of certain word combinations? Also, the paper is giving the same weight to a window of w lengths. Is it all right to give more weight to the nearest time period?

lyl010 commented 4 years ago

I agree with @nwrim that the author did not give a clear explanation on the use of KLD, especially what does s(j) mean. And I think the measurement of 'novelty' is based on the assumption that transcend your history is novel. My question is, will it be easier or harder for later work to be deemed as very novel? The measurements of transience and novelty are pretty interesting. And I am wondering this measurement is able to seize the dynamics of speech's characteristics. But is this measurement really stable, and how far we can explore with this measurement?

ihsiehchi commented 4 years ago

This paper was interesting for sure. My question is much more applied - I was wondering with the rise of populism, do people mimic past politicians less and the general population more when speaking in congress or other political events? And if that is the case, how does the extent vary with the degree of publicity of the event in which they speak?