Humorloos / IE683

0 stars 0 forks source link

Investigate whether movie datasets satisfy requirements #5

Closed Humorloos closed 3 years ago

ashishrana160796 commented 3 years ago

Brief Description Of Use case: [Building Generic evaluation framework for identifying similar underlying integration framework for various use-cases] [Provided there is equivalence amongst relationship definitions of different use-case datasets. For example, equivalence in driving tables granularity i.e. the level at which the entity(or row) is expressed. Like, (streaming service name, ) ]

Brief Description Of Use-case:

  1. Summarizing streaming services related data across different regions. Details drilling down from (Streaming Movies/TV-Series until , The movie database API for creating a movie level database, movie lens dataset to get the movie reviews, kaggle dataset to get the cast and crew information, [Extension: Twitter movie level hashtags or DBPedia Data, UCI machine learning repository]).

Relevant Links & APIs: Streaming Data: https://rapidapi.com/gox-ai-gox-ai-default/api/ott-details/ | https://rapidapi.com/meteoric-llc-meteoric-llc-default/api/watchmode/ Movie Level Data: https://www.themoviedb.org/documentation/api, https://rapidapi.com/amrelrafie/api/movies-tvshows-data-imdb/ Additional available datasets: https://www.kaggle.com/rounakbanik/the-movies-dataset?select=ratings_small.csv, https://grouplens.org/datasets/movielens/, https://archive.ics.uci.edu/ml/datasets/Movie

[There is a generic trend for use-cases like these in which we can use the API directly to build datasets, namely: e-commerce, music, anime. It is feasible to pursue these topics as well because of easy dataset availability. But, we have to restrict to the same design principles as in the movie use-case.]

Problem Constraint Justifications:

  1. Yes, at least there are 3 datasets that can be integrated into one dataset. The design is extensible to include more information as well.
  2. 2500 entities baseline condition is satisfied. N_str(=10)*200 titles of movies + series will be there. (Our main driving table)
  3. Yes, 1000+ entries are present in each of the dataset stated above.
  4. Yes, we will be having like 20+ attributes (streaming service, country, region, movie, cast, director, producer, ratings, few reviews etc.)
  5. At least 5 attributes in common for two different datasets to implement fusion voting. An example would be preparing the baseline datasets from the movie db & imdb movie api. It's an easy approach to do fusion between highly similar two datasets.
  6. Possible List attributes: producers, cast, ratings from different platforms, reviews(x=10), Crew members (sparse possibility).
Humorloos commented 3 years ago
ashishrana160796 commented 3 years ago

A replacement alternative to our API dependency demand. Based on the new datasets found there are two use-cases we can pursue as listed below. @Humorloos, you can look into these use-cases and post that #1 tasks can be further continued by me. Thanks!

  1. [Same] Building a ott streaming movie dataset.

  2. [New] Using user interactions for Netflix to provide enhanced experience for customers in UK.

ashishrana160796 commented 3 years ago

Driving dataset links for future reference:

  1. OTT TV shows list: https://www.kaggle.com/ruchi798/tv-shows-on-netflix-prime-video-hulu-and-disney [Not of Interest]
  2. Amazon Prime movies only: https://www.kaggle.com/padhmam/amazon-prime-movies [Not of Interest]
  3. Movies on OTT Platform: https://www.kaggle.com/ruchi798/movies-on-netflix-prime-video-hulu-and-disney/version/2 [Interest]
  4. Netflix movie detailed data: https://www.kaggle.com/ashishgup/netflix-rotten-tomatoes-metacritic-imdb
  5. User Interaction data Netflix Uk: https://www.kaggle.com/vodclickstream/netflix-audience-behaviour-uk-movies
ashishrana160796 commented 3 years ago

Use case Description

For providing excellent service to customers, customer feedback plays a very important role. But, customer feedback and inputs are susceptible to biases and measurement errors. Modern day business processes are aware of such limitations and therefore, rather these customer experience enhancing processes are designed around data driven insights. Netflix was one of the first services in online content streaming services to exploit such user data level insights to provide great recommendations. In our project we will transform the user interaction with movies data on Netflix and supplement that with additional movie related information like revenues, actors, directors, ratings, synopsis, genre etc. This integrated dataset will assist in providing Netflix users more enhanced movie recommendation experience. Also, this data will further assist in generating user level insights for different movies and will assist in de-confounding the reasons for successful movie streaming numbers.

For example, we can more accurately determine whether the user's movie streaming decision on Netflix depends on movie revenue, rotten tomatoes or IMDb ratings, availability on other streaming platforms etc. As more movie revenue might mean that users would have seen the movie in the theater and it probably is not wise to immediately make it available on the platform by paying heavy streaming rights. Also, with insights from movie ratings and number of reviews from platforms like IMDb, rotten tomatoes etc. we can find highly coveted movies that might not have been widely viewed. Clearly, these insights at user level or movie level data granularity will proffer great insightful explanations for enhancing user experience and business revenues processes as well.