anqi-lu / TMDB-keywords

Movie plot keywords visualization.
MIT License
2 stars 0 forks source link

TMDB Movie Plot Keywords Visualization

A project that visualizes most frequent plot keywords in movies from TMDB 5000 Movie Dataset on Kaggle.

Deployed on github pages at

The TMDB 5000 Movie Dataset contains information for about 5000 movies from The Movie Database(TMDB), a crowd-sourced movie information database. The movie information includes movie genres, country, actors, directors, plot keywords, gross profit, and much more.


The first visualization I made on this dataset was a donut chart, illustrating the number of movies by country. It shows the which countries are the movies from. Because TMDB is built entirely by the user so there could be a lot of bias in the data (Most movies are from western countries). Therefore, I decided that I was not going to do much with the “country” attribute of the data. Instead, I decided to focus on the “plot_keywords”, “genres”, and “year”.

The questions about this dataset became:

According to the questions, I listed the following tasks:

  1. Word cloud of top 20 key words for all genres
  2. Genre filter — display top 20 keywords for each genre 3. Line chart of count by year
  3. Line chart of count by year



Source of Inspiration: 60 years of french first names and stream-graph explorer

This sketch brings the additional tasks:

  1. Select multiple keywords and assign different color for each line in the line chart
  2. Hover to select a year and corresponding tooltip
  3. Zoom by brushing on the year


wordcloud genre screen shot 2017-11-06 at 8 26 05 pm tooltip

Future Work


This D3.js web project is forked from curran's this template project.