sfatihi24 commented 10 months ago

Text Analytics Across Harry Potter Books and Movies

We decided to not have our final project be an extension of our Shiny App. Instead, we wanted our project to be centered around text analytics of the Harry Potter series. We found two data sets: one containing every line of each book, and the other containing all the dialogue in each movie. Using this data we want to analyze the language used by the book/screenplay authors and explore how it correlates to the feelings conveyed in their work.

Data:

https://www.kaggle.com/code/cihancetin/transcripts-of-the-harry-potter-movies https://raw.githubusercontent.com/gastonstat/harry-potter-data/main/csv-data-file/harry_potter_books.csv

Representations:

1. Sonification of Interactions Between Harry and Voldemort

We can create a piece of music where pitches are played at the point in the piece where Harry and Voldemort’s names are mentioned. We can choose opposite pitch collections (either opposite on the circle of fifths, or diminished chord collections that are half steps apart) to represent Harry and Voldemort, as they are embodiments of the opposite forces of good and evil. Then, the piece will coalesce as they are being referenced together and will have moments where it’s just Harry and just Voldemort.

2. Visualization of Sentimental Progression of Chapters For Each Book

This visualization will involve seven lines (one for each book) of colored vertical lines, each line corresponding to a collection of sentences in the books. This will go along with the piece of music we will have, so that as the music plays, the user can follow along to see the sentiment of the book changing. If it works how we intend, the effect will show how JK Rowling uses traditionally lower rated words (by sentiment) when describing Voldemort, and higher rated words when describing Harry.

3. Various text analysis visualizations across the series

We can create a shiny app that allows users to choose what text-based data they would like to be displayed. Data visualization options include bar charts, word clouds, and line graphs. These visualizations could answer:

What are the most common words (excluding stop words) that are used in the book? How does this compare to the most commonly used words in the movie? Instead of having this data displayed by book, we can display it by character which should give insight into who/what is most relevant to different characters. This can also be filtered to only show results for character names which will show how frequently a character mentions another (could potentially be a directed network instead of part of the shiny app)
Which characters have the greatest/lowest line frequencies? How does this compare to their importance in the series? (This second question would require a comparison to name frequency which could again be displayed as a network)
Throughout the series which spells are the most common and how does their frequency change? What indication does this have on the sentiment of the books? How do we see the series darken with the introduction of forbidden spells?

Schedule:

Data Wrangling Check-in: November 20 Visualizations Check-in: December 1 Blog Draft with Visualizations Added: December 6 Finished Visualizations: December 10 Finished Blog: December 11

Important dates:

Status update 1: 11/16 Status update 2: 11/30 In-class presentations: 12/7 Final blog due: 12/13

katcorr commented 10 months ago

This plan sounds amazing! I love the creativity. I have not heard of "sonification" before (how is this done?). All 3 representations are great ideas, and go well together, but it also sounds like a lot -- perhaps too much to do all three? I think the first two would be enough for the blog project, but I also really like the third idea so if you have time to incorporate, that would be cool. (but not required) Alternatively, just idea 3 would also suffice.

Schedule sounds good.

You may want to check out this R package for Harry Potter color palettes 😄

Blog plan: 10/10

johnjoire commented 10 months ago

We are on track so far! Our initial schedule has us having wrangling done by the 20th, so both of us are going to work on it a bit at the beginning of break. As for our three "visualizations," Sarah is planning on doing the third idea, while I will be doing the first idea (and the second idea if I have time).

I haven't fully figured out my process for sonification, but this is my plan for that:

Wrangle so that the data set has each instance of "Harry," "Potter," and "The Boy Who Lived" and the proportion out of the entire text that the words appear in
I will also wrangle another data set with each instance of "Tom," "Riddle," "Voldemort," "He Who Must Not Be Named," and "You Know Who"
Then, I will use these proportions to either write a program that makes the piece of music for me, or manually will input notes in Sibelius (a composition software)

katcorr commented 10 months ago

Sounds great!

Status Update 1: 5/5

johnjoire commented 9 months ago

Still on track! We finished our wrangling and plan on finishing visualizations in the next few days in preparation for the presentations in a week.

I will say I personally have been having trouble conceiving how I will create the sonification I had originally planned. I saw your issue describing the rmusic package, and will look into that further. If it reaches a point of diminishing returns, I will shift gears and will make a new type of visualization that shows character name mentions over time, using a stacked area chart.

katcorr commented 9 months ago

Great!

Another idea (to take or leave) if you can't get the music to play (I agree you shouldn't spend too much time trying to figure that out; it could be a rabbit hole or dead end): you could use geom_icon or geom_emoji to plot a musical note as points and then use different colors for different notes (instead of position?) and different sizes for different lengths? or something like that. It can be abstract-ish :) Or, the gganimate package could be fun to explore, which can animate a plot.

Status Update 2: 5/5

acstat231-f23 / blog-potterwatch

Blog Plan #1