Are these digitalized books representative? If not, does this matter?

This reading introduces possible ways to analyze culture with digitized books. However, are those 4% of all books ever printed the population of the book printed between 1800 and 2000? If not, are they representative to the population, or are they biased to some particular categories? If yes, are they all published between 1800 and 2000, or do they have any classics that were actually produced earlier? How to get the clean results from confounding facts with such kind of data? Does representativeness matter in the analyses with such kind of data?

Computational-Content-Analysis-2018 / 12-Jan-Quantitative-Analysis-of-Culture-Using-Millions-of-Digitized-Books

Are these digitalized books representative? If not, does this matter? #26