CultureAsData-UIUC / is310-fall-2024-group-6

To be updated with a description for a collaborative semester long project
Creative Commons Attribution 4.0 International
0 stars 2 forks source link

Computing Cultural Data: Method-Topic Modeling, Type-Literary Texts #19

Closed GloriaLiu930 closed 1 week ago

GloriaLiu930 commented 1 week ago

Define Topic Modeling: Topic Modeling is a method that identifies clusters of words in large text collections to reveal hidden themes or topics.

Why this is valuable for cultural data: Our project topic is Literature and Texts – Children's Literature. And this method is commonly used in literary studies to analyze themes, trends, or even authorial influences.

Scholars: Omar, A. (2020). Towards computational models to theme analysis in literature. International Journal of Advanced Computer Science and Applications, 11(9), 93-99–99. https://doi-org.proxy2.library.illinois.edu/10.14569/IJACSA.2020.0110911

Review & Summarize Selected Scholarship:

  1. Bibliographic Information Authors: Abdulfattah Omar Title: Towards Computational Models to Theme Analysis in Literature Publication Venue: International Journal of Advanced Computer Science and Applications Publication Date: 2020

  2. Computational Method or Cultural Data Type Method: The study uses Vector Space Clustering (VSC), a lexical clustering approach for thematic classification of literary texts. The data is represented using the Vector Space Model (VSM), where text is processed into clusters based on lexical content. Transformation: VSM and clustering algorithms transform the literary text into clusters that group similar themes, making it possible to objectively categorize and analyze recurring themes.

  3. Summary of Argument and Use of Computational Method: Argument: Omar argues that computational theme analysis offers an objective and replicable approach to categorizing themes in literature, addressing limitations in traditional literary criticism such as subjectivity and limited scalability. Method Application: By clustering lexical data, the study objectively groups Hardy’s novels and stories according to thematic content. This approach enables a structured thematic analysis of large volumes of text, revealing patterns that may be overlooked through traditional methods.

  4. Code and Data Availability: Data: The study’s dataset consists of Thomas Hardy’s novels and short stories, represented as lexical vectors. Code: No specific code is provided in the study, but the methods are based on established vector clustering techniques in data science, making it replicable with standard clustering tools.

  5. Interest, Usefulness, and Disciplinary Approach: Interest: The study’s approach to theme analysis is intriguing as it integrates computational and literary methods, addressing both cultural content and computational rigor. Disciplinary Approach: This study aligns well with Digital Humanities, using data science techniques to advance traditional literary criticism.

  6. Potential Use for Group Project: This study provides a clear framework for using lexical clustering in thematic analysis, which is directly applicable to topic modeling in literary texts. The structured approach to theme identification will be valuable for analyzing recurring themes across a selected corpus in our project.