STAT325-S24 / Frankenstein

Other
0 stars 0 forks source link

LDA Analysis #8

Closed jpapagelis24 closed 7 months ago

jpapagelis24 commented 7 months ago

For my LDA analysis, I will explore a couple of different ways of topic modeling in Frankenstein.

  1. I will split the novel into sections based on the narrator (Walton, Frankenstein, or The Creature) and then perform LDA on the sections to see the most prominent topics of each section. Then, I can compare the topics of each section to each other to see the similarities and differences within each narrative. If there is too much information (and the topics are unclear), I can perform LDA on the individual chapters of the narrators rather than the section as a whole.
  2. Additionally, I will use LDA on the novel as a whole to see if the analysis will be able to separate the novel into sections that differ by the narrator. This will take some playing around with the parameters to see what works and doesn't work.
nicholasjhorton commented 7 months ago

In class on Thursday I'll be asking everyone to briefly describe first steps and next steps for their analyses. Can you please:

  1. share any updates (commits) and
  2. describe what you plan to do over the coming week?

Thanks in advance, Nick

jpapagelis24 commented 7 months ago
  1. I completed my preliminary steps with the LDA analysis, which included prepping/wrangling the data to be used for an LDA analysis, and then created a basic LDA model. I worked on creating the LDA analysis by chapter. See 856667a853d49cb12f6c9b4e345c59c363c58444
  2. I will work further to refine the LDA analysis by the narrator and see what else works. Specifically, I would like to check out how the different number of topics I choose affects the way the novel is split.
nicholasjhorton commented 7 months ago

Looks great: what value of k are you thinking about? Perhaps 12? 20? Arbitrary medium sized prime?

nicholasjhorton commented 7 months ago

@jpapagelis24 can you please share updates on where things as well as let me know if there's anything that you'd like me to look at or comment on over the weekend? As always, please let me know if you run into any issues or need assistance. Thanks in advance, Nick

nicholasjhorton commented 7 months ago

@jpapagelis24 do you have any updates on this front? Can you please share some preliminary analyses so that I could review before class tomorrow?

Note that I moved the LDA.qmd file to vignette and tweaked DESCRIPTION so that the package build didn't complain (see https://github.com/STAT325-S24/Frankenstein/commit/1f7155ed924e93e3b6f7f04177c4679d2cd54424).

nicholasjhorton commented 7 months ago

As always, please let me know if you have questions or run into any issues.

jpapagelis24 commented 7 months ago

@nicholasjhorton Completed most of the analysis for the LDA part. I still have to finish up writitng down the actual results of what I found, but for the most part I have the skeleton of the code done. Still playing around with which k works best for the novel.

See commit: 01f75d7fa676c4b38fb17b23f408d03e71a4ade3 or pdf: https://github.com/STAT325-S24/Frankenstein/blob/main/vignette/LDA.pdf

jpapagelis24 commented 7 months ago

@nicholasjhorton Here is the commit for the completed LDA analysis: 36d5706ea8da31b3dceab1e429be5d0a2e3d3878 and the pdf: https://github.com/STAT325-S24/Frankenstein/blob/main/vignette/LDA.pdf

nicholasjhorton commented 7 months ago

This is shaping up really well: nicely done!

I'll share some comments in class.