Closed LucasWashor closed 2 months ago
Datasets Historical Children's Stories(18th to the 20th c.):https://content.lib.washington.edu/childrensweb/index.html Nowadays Children's Stories: https://www.kaggle.com/datasets/edenbd/children-stories-text-corpus
Datasets
Historical Children's Stories(18th to the 20th c.): https://www.kaggle.com/datasets/mateibejan/15000-gutenberg-books/data Contemporary Children's Stories: https://www.kaggle.com/datasets/rohinikrishnamoorthy/goodreads-childrens-book-dataset
1. How do these datasets differ in how they represent cultural objects or practices? Historical Dataset: Focuses on raw text with basic metadata (author, title, etc.), maintaining the literary content but losing any physical, visual, or cultural representation associated with the original works.
Contemporary Dataset: Emphasizes user interaction data (ratings, reviews), reflecting modern cultural consumption practices rather than focusing on the content of the books themselves.
Comparison: The Gutenberg dataset preserves the text but not the cultural context, while the Goodreads dataset focuses on how contemporary readers engage with literature.
2. What kind of metadata or context accompanies the data? Historical Dataset: Contains basic metadata (author, title, publication year) but lacks detailed context about the book’s historical, cultural, or physical attributes. The context focuses on the text for analysis.
Contemporary Dataset: Provides metadata like ratings, reviews, publication year, and author but lacks any historical or physical context. The focus is on reader reception rather than the cultural or historical significance of the works.
Comparison: Both datasets provide minimal context on cultural or historical origins, focusing primarily on content for analysis or public reception.
3. Does the dataset reflect any power structures or biases? Historical Dataset: Reflects the bias of selection from the Gutenberg archive, which tends to favor Western and canonical literature, potentially marginalizing non-Western works and alternative narratives.
Contemporary Dataset: Reflects biases inherent in user-generated content, such as popularity biases, potentially marginalizing lesser-known or culturally specific works.
Comparison: Both datasets reflect biases in their content curation: Gutenberg's bias comes from historical preservation practices, while Goodreads reflects modern consumer trends.
4. Are there any notable gaps in the data? Historical Dataset: Missing physical attributes (cover designs, illustrations), and underrepresents non-Western narratives.
Contemporary Dataset: Lacks actual book content (focuses only on reviews/ratings) and cultural context behind the books.
Comparison: The Gutenberg dataset omits physical and cultural context, while the Goodreads dataset omits book content and deeper historical or cultural significance, each driven by technical limitations or focus on usability.