Santostang / box-office-prediction

Cornell ORIE 4741 Course Project: Machine Learning with Big Messy Data
6 stars 5 forks source link

Midterm_report_jx266 #11

Open jx266 opened 6 years ago

jx266 commented 6 years ago

Hi guys, your topic is very interesting. The goal of your project is to predict a movie’s gross based on its first-week reviews after the release date. People always believe the movie will have a great box office performance if its reviews are positive. However, it is not true in some situation, so it will be tough and meaningful to find what are key points to lead a high gross for a movie. The first job you guys did is data modification. You found that some data are obtained under a different situation and you decided to use the first-week reviews as well as first-week gross after wide release data. Then, you did data cleaning to delete some meaningless data. It is clear and useful method you deal with this large dataset. But I wonder know if there is any effect on the accuracy of the model if you treat the movies with limited releases using the same method. The second part is about variable definition and data analysis. I love this part for these well-done graphs. From your work, I know that you use the linear regression model to learn the dataset and set different features to figure out what the latent relationship between the gross and these features. The results turn out that the opening gross is the most important feature and I can find there is a positive linear relation between them. What is amazing is that you guys combine the opening metascore and opening gross together and it is easy to find when one movie have around 70 scores and high opening gross, the total gross will be so exciting.
In general, I like your topic and I can see you are focusing on expanding the dataset to provide more data for prediction. I will follow your project and look forward the further result.