Humorloos / IE683

0 stars 0 forks source link

Rewrite use case based on new datset selection #13

Closed Humorloos closed 2 years ago

subashp93 commented 2 years ago
                                Aggregating Netflix movie dataset information

For providing excellent service to customers, customer feedback plays a very important role. But customer feedback and inputs are susceptible to biases and measurement errors. Modern day business processes are aware of such limitations and therefore, rather these customer experience enhancing processes are designed around data driven insights. Netflix was one of the first services in online content streaming services to exploit such user data level insights to provide great recommendations. In our project we will aggregate and transform all the information of movies dataset on Netflix and supplement that with additional movie related information like revenues, directors, synopsis, genre, language, runtime etc. This integrated dataset will assist in providing Netflix users more enhanced movie recommendation experience. Also, this data will further assist in generating user level insights for different movies and will assist in de-confounding the reasons for successful movie streaming numbers.

For example, we can more accurately determine whether the user's movie streaming decision on Netflix depends on movie revenue, rotten tomatoes, availability on other streaming platforms etc. As more movie revenue might mean that users would have seen the movie in the theater and it probably is not wise to immediately make it available on the platform by paying heavy streaming rights. Also, with insights from movie release duration and review. we can find highly coveted movies that might not have been widely viewed. Clearly, these insights at user level or movie level data granularity will proffer great insightful explanations for enhancing user experience and business revenues processes as well. 

For implementing this user data integration with movies attributes, we logically follow the relational joining process highlighted in Figure~\ref{fig1} below. All the Three major datasets that we are using are present in homogeneous \textit{csv} format and these datasets are accumulated at \textit{kaggle} with assistance from various API libraries. In the below section we further first elaborate on schema and basic profile for each dataset with Table~\ref{tab:datasets}. And secondly, we summarize the integrated schema and overlap with input schemata with Table~\ref{tab:integrated_schema}. 
ashishrana160796 commented 2 years ago

We have to remove the user related aspects from the discussion in the use case description @subashp93. Hence, I have rephrased the use case description even further to meet our requirements. Please, do look into it in proofreading stage and improve on it further if required.