UChicago-Computational-Content-Analysis / Readings-Responses-2024-Winter

2 stars 0 forks source link

5. Machine Learning to Classify and Relate Meanings - [E3] Klingenstein, Sara, Tim Hitchcock, and Simon Dedeo. #31

Open lkcao opened 6 months ago

lkcao commented 6 months ago

Post questions here for this week's exemplary readings:

  1. Klingenstein, Sara, Tim Hitchcock, and Simon Dedeo. 2014. “The Civilizing Process in London’s Old Bailey.” PNAS 111(26):9419-9424.
michplunkett commented 5 months ago

This paper looks at the separation of trials into clusters of violent and non-violent accusations. Given that these records go back to 1760, how do the researchers ensure that they are getting accurate reads of the court records? Anecdotally, it's difficult to maintain the quality of written records that are over 10 years old. I am also curious how these records were digitally transcribed, was it through some use of OCR or were they manually typed out? If it was OCR, how confident can the researchers be that they are getting exactly what was written by the transcriber?

ana-yurt commented 5 months ago

This paper reveals the gradual pace of change in the creation of new bureaucratic practices and distinctions associated with the control of violence. I wonder if there are specific features of the legal bureaucratic corpus that may contribute to the resulting trend in the analysis. For example, does it contain linguistic inertia that slows down change in comparison to cultural evolution n other areas?

erikaz1 commented 5 months ago

Klingenstein et al. (2014) analyze the semantic content of jury trials in order to demonstrate increasingly large differences between the treatment of violent vs. nonviolent cases. Their methodology for text analysis includes "coarse graining", or grouping related words into broader topic categories rather than have many thousands of individually important tokens. The paper repeats this phrase many times; it is clear that "coarse graining" is an essential component of this paper, and it ultimately helped produce significant results. My question is: how did they know that "coarse graining" would suffice in picking out the patterns between trials types? What would be different had they took a "finer grained" approach (other than being computationally laborious - the paper was written in 2014)?