feedback on report dated 2024-03-04 (but I suspect that this is what you wanted me to review)

nicholasjhorton commented 3 months ago

@jpapagelis24 are you all set with your proposed analyses? I see that the only open issue related to your book project is #9.

It would be helpful today to start to flesh out what tasks you are planning to complete over the next week as you finish up your work. As always, please don't hesitate to reach out with questions.

jpapagelis24 commented 3 months ago

@nicholasjhorton The final steps for LDA analysis are just reviewing the comments and making some final adjustments, which I hope to have done over the weekend.

nicholasjhorton commented 3 months ago

That sounds great.

In the interim, I look forward to having a chance to see some of this in class today.

nicholasjhorton commented 3 months ago

I had a chance to review your draft report. I really enjoyed it and think that it is shaping up well.

I suspect that your report would be useful for two audiences: those seeking to learn more about Frankenstein and those looking for an annotated example of text analytics.

For the first audience, you'll want to include more of the text, some sample paragraphs, and examples to complement Figures 1 and 2.

For the latter, you'll want to describe how things look at each stage of your wrangling, e.g., the tokenized text (for a sample paragraph), some beta's and gamma's for sample paragraphs, etc. Note that this will also be valuable for the first type of reader.

Is there a better way to order the 12 topics? Can you think of a chronological progression?

Here are some more specific comments:

date is wrong (2024-03-04)
please add a BibTeX ref for Frankenstein and the Blei et al paper. Are there other citations that you want to incorporate? I suspect that the Tidytext book would be one and perhaps some of the packages used.
please add a table of contents
The introduction of the report feels very terse. Can you add a bit more background on why this is an important book to study?
Are there ways that you could incorporate some flavor or short excerpts in the paper for what I suspect will be section 1 (Introduction)?
I like your justification for k = 12
pesky comment: pipe directly into mutate when you create narrators
reformat code so text doesn't flow into the righthand margin
avoid magic number (12) so that if it were to change, you wouldn't have to change it in multiple places.
6854 is another magic number: please also comment on where this came from
the use of as.numeric() always worries me, given how brittle it is. Would readr::parse_number() be appropriate to replace here?
I really like the labels for the 12 topics. But please put them in quotes so that it's clear that these are your names for them. (It may require some care to have quotes within quotes along with the apostrophes in topics 1, 2, 3, etc.: please let me know if you get stuck.
I still wonder if a graphical display of the topics over the course of the book would be helpful. (I'm finding Figure 2 hard to parse, due to the large number of topics.)
Please mention the Shiny app somewhere in the report.

jpapagelis24 commented 3 months ago

@nicholasjhorton I worked on this a bit more and covered most of the comments you left. Does my figure 2 visual look better? I tried to make it more easy to view where each topic is located in the novel.

Still to complete: more examples for beta & gamma, talk about shiny app, chronologically order topics

Commit: 1d8bb1a53a89ba0d1a7b753194edc1bfdb89e933 pdf: https://github.com/STAT325-S24/Frankenstein/blob/main/vignette/LDA.pdf

STAT325-S24 / Frankenstein

feedback on report dated 2024-03-04 (but I suspect that this is what you wanted me to review) #11