austincap / AlwaysSunny

Various tools for analyzing Always Sunny
3 stars 0 forks source link

Unsupervised learning #1

Open ewellinger opened 8 years ago

ewellinger commented 8 years ago

Hey there!

I actually had the same idea for doing analysis on Always Sunny episodes but stalled out almost immediately in the data collection process. One of the things that I thought would've been really cool would be to track the "allegiances" of the gang to see who was pairing up with whom and who was the most likely to double cross each other. That proved to be really hard to keep track of because what they were aligning themselves over was changing over the course of the episode.

Have you thought of trying to do any unsupervised learning on your data such as kmeans or NMF? Could be cool to see how the episodes group together based on what is happening. Anyway, I'm glad someone had the initiative to get past the data collection part and make some visualizations.

I also saw you manually coded up a neural network, have you looked into something like Keras? It would run a lot faster, but the amount of data is still woefully small to be using a net with.

Best, Erich

dbelling commented 8 years ago

It would be particularly challenging to see the former allegiance visualization done for episodes like The Gang Gets Held Hostage or The Gang Gives Frank an Intervention.

Great job with this project!

ewellinger commented 8 years ago

Yeah, I tried to record that information for only 2 episodes before realizing that it would be way too big of a pain.

austincap commented 8 years ago

I appreciate the kind words guys! I really am a sucker for kudos from internet strangers.

When creating data for this I utilized plot summaries on the wiki a lot. Typically it'd say something like "Charlie & Dee team up, meanwhile Mac, Dennis & Frank do something else" so for many of these it's fairly definitive. Certain episodes like "Frank Retires" where alliances are made and broken up frequently I tended to just make everyone separate. The Gang Gets Held Hostage has everyone in their own head so I made that separate too. Unless I'm misunderstanding you, The Gang Gives Frank an Intervention was pretty clear-cut. Season 1 was by far the hardest to codify because of the less clear-cut teams and lack of Frank. As the seasons went on it seemed like there were more meta-jokes explicitly referencing "teaming up" and "winning" so it got way easier to codify. The way I got through the data collection process itself was drinking beer and watching Always Sunny while telling myself I'm being productive.

If I ran some clustering algorithms on this I'd need a lot more features in my data because, with what I have now, I don't think it'd be able to tell me much more than episodes with similar groupings are similar. Eventually, I plan to add plot keywords from IMDB, reoccurring guest characters, writers, directors, and year so clustering would be a good idea then. I think that'll help me get better results from the neural net too. I didn't know about Keras, thanks for the tip! Any other suggestions for how to analyze this using unsupervised learning algorithms? Also I'm trying to figure out a way to understand this data with network analysis, so any advice on how to do that would be welcome too. Exploratory data analysis is great, but next season I want to try predicting episode quality based on plot summaries and previews.