STAT325-S24 / MobyDick

Other
0 stars 1 forks source link

identify prior text analytics work for your book by the end of the day on Friday (and close this issue) #2

Closed nicholasjhorton closed 5 months ago

nicholasjhorton commented 5 months ago

I look forward to next steps with your chosen book.

Some of your selections have been used as examples and analyses in the past. Others may be less explored.

The goal for this first assignment is to identify some prior work that you might leverage as a way to identify some cool avenues to explore. Are there some posts or links that would be helpful for you and me to review? Is there a data package in R already set up?

For less common books, you might want to note that you couldn't find anything, and instead try to identify some articles in the literature that have analyzed your book.

The goal is that by the end of day on Friday you would have done a first pass through this process and commented on this issue with those annotated links.

Questions? Please feel free to open a new issue on GitHub or DM me on Slack.

nicholasjhorton commented 5 months ago

Please let me know if you have any questions, run into any issues, or would like an extension. Warmly, Nick

arogers24 commented 5 months ago

I've found two pretty cool explorations with Moby Dick:

The first is more of a study of Herman Melville specifically than Moby Dick, but interesting nonetheless. The work is inspired by Melville's Marginalia, a released work of Herman Melville's marginal notes of famous Shakespeare plays. Melville was inspired by the writings of Shakespeare, and spent much time reading Henry VIII while writing Moby Dick. The author of this work and this analysis finds similar word usage and phrasing between the play and the novel with word clouds and bigram analysis.

The second work is a fun exploratory project into LLMs. The author aimed to finetune a LLM using Moby Dick to generate a novel of his own. The github page does not include much detail, but I believe they start with a "news-focused" LLM (I'm guessing one that writes newspaper articles), and finetune it with the text of Moby Dick to generate the novel. As you can see, still after 100 epochs, there are hints of news in the work.

nicholasjhorton commented 5 months ago

These are interesting. We should check in about what might an interesting avenue to explore here.

See https://github.com/STAT325-S24/MobyDick/commit/9aa3477b89d20e2ba6aaa35baac40605abcfa3de and https://github.com/STAT325-S24/MobyDick/tree/main/resources for a downloaded version of those resources.

See also #3 which might be helpful.

I also found https://ds4world.cs.miami.edu/text-analysis which may be worth reviewing.