Computational-Content-Analysis-2018 / 12-Jan-Quantitative-Analysis-of-Culture-Using-Millions-of-Digitized-Books

Michel, Jean-Baptiste et al. 2010. “Quantitative Analysis of Culture Using Millions of Digitized Books.” Science express, December 16.
https://github.com/Computational-Content-Analysis-2018
0 stars 2 forks source link

question about how n-gram frequency is defined #19

Open mbokanga opened 6 years ago

mbokanga commented 6 years ago

N-gram frequency for a given n-gram in a given year was computed as the number of instances of that n-gram that year divided by the number of words in total used that year. This seems to me like working with two different units, n-grams and just words in general. Why wasn't n-gram frequency the number of instances of an n-gram divided by the number of n-grams that year (which isn't necessarily equal to the number of words)? This may be an amateurish technical question, but it confused me