Computational-Content-Analysis-2018 / 12-Jan-Quantitative-Analysis-of-Culture-Using-Millions-of-Digitized-Books

Michel, Jean-Baptiste et al. 2010. “Quantitative Analysis of Culture Using Millions of Digitized Books.” Science express, December 16.
https://github.com/Computational-Content-Analysis-2018
0 stars 2 forks source link

Distribution of the types of books in each year? #16

Open bdauzat opened 6 years ago

bdauzat commented 6 years ago

I am curious about the types of books google scans for each year. If the distribution of the types of text across each year is different, then comparing across years to draw inferences about a change in culture can be misleading. For example, if the types of older books that get scanned into google books tend to be the types of books that for whatever reason mention celebrities less (more likely to be great works of fiction, maybe), then it is not clear we can compare the mentions of celebrities to today and draw the types of inferences the author wants. This is because the observed difference could be because of a change in culture, but could also be because there are relatively fewer great works of fiction in the recently scanned books. The issue is that the authors want to infer that culture is changing, but it could just be the type of text is changing across the years. In fact, I think it is likely that the types of books we preserve from 200 years ago are very different from the types of books we have available to scan more recently, and not controlling for the 'type' of book seems to be a problem for the author's analysis.