EDA Checkpoint Feedback

Score (out of 5 pts)

Score = 5

EDA Checkpoint Feedback

	Quality	Reasons
EDA Relevance	P
EDA Analysis and Description	D	In the section, `How accurate are our predictions if we compare it with the actual rating a movie received`, I think the model is incorrect. Have you treated the average rating as a categorical variable? The model output and the plot depict this. As we further go into the model exploration, there is no description provided. You should guide the viewer as to what is to be learned.
EDA Figures	D	Your plots could use some help. Plotting all top 100 actor-director pairs, provides me no information, as well as, confuses the reader, since it is very hard to understand. Additionally, the box plot of avg ratings of actors vs rating, is pretty wrong.

Comments

Regrade Feedback

Rubric

	Unsatisfactory	Developing	Proficient	Excellent
EDA relevance	EDA is mostly neither relevant to the question nor helpful in figuring out how to address the question. Or the EDA does address the question, but many obviously relevant variables / analyses / figures were not included. EDA does not include explore distributions of single variables or relationships between variables or both	EDA is partly irrelevant/unhelpful. Or some obviously relevant variables / analyses / figures were not included. EDA does not include a few distributions of single variables or relationships between variables	EDA is almost all relevant / helpful in addressing the question. No obviously relevant variables / analyses / figures were not included.	Thorough EDA addressed all aspects that are relevant to the question
EDA analysis and description	Many of the analyses are poor choices (e.g., using means instead of medians for obviously skewed data), or are poorly described in the text, or do not aid understanding the data	Some of the analyses are poor choices, or are poorly described in the text, or do not aid understanding the data	All analyses are correct choices. Only one or two have minor issues in the text descriptions supporting them. Mostly they fit well with other elements of the EDA and support understanding the data	All analyses are correct choices with clear text descriptions supporting them. The figures fit well with the other elements of the EDA, producing a clear understanding of the data.
EDA figures	Many of the figures are poor plot choices (e.g., using a bar plot to represent a time series where it would be better to use a line plot) or have poor aesthetics (including colormap, data point shape/color, axis labels, titles, annotations, text legibility) or do not aid understanding the data	Some of the figures are poor plot choices or have poor aesthetics. Some figures do not aid understanding the data	All figures are correct plot choices. Only one or two have minor questionable aesthetic choices. The figures mostly fit well with the other elements of the EDA and support understanding the data	All figures are correct plot choices with beautiful aesthetics. The figures fit well with the other elements of the EDA, producing a clear understanding of the data.

Grading Rules

Scoring: Out of 5 points

Each Developing => -1 pts Each Unsatisfactory=> -2 pts until the score is 0

If students address the detailed feedback in a future checkpoint they will earn these points back

DETAILED FEEDBACK should be left in the data section AND anywhere the student addressed proposal feedback but did not do it to your satisfaction

Hi Kunal,

Thank you for your feedback on our EDA checkpoint! We took your advice into consideration and made several changes to our analysis. Specifically, we focused our analysis on the relationship between the average rating of actors in a movie and the movie's rating. We retained the scatter plot to visually represent this relationship, which is more directly relevant to our research question. Also, we introduced a new section that explores grouping actors' ratings based on user input. This includes a method to calculate actors' weighted average scores, considering movies where multiple actors from the input list have appeared. We also explored exponentially weighted scores to address the insignificance of linear weights when the input size is large. We also added lines and comments to better guide the viewer as to what is to be learned.

In response to your feedback on our data visualization, we refined our approach to provide a more insightful analysis of Actor-Director partnerships. We tried to make it more clear that the analysis of Actor-Director pairs serves as a crucial part of our exploration of potential confounding variables that could influence a movie's rating. By examining the average ratings associated with specific Actor-Director collaborations, we can discern patterns and assess the impact of these partnerships on a film's success. This analysis helps us understand the dynamics of these relationships and their relevance to our research question. To enhance the readability and informativeness of our visualizations, we maintained our focus on a sample of 100 Actor-Director pairs but introduced three more readable figures: a bar chart of the top 100 Actor-Director pairs by average rating, a scatter plot of the top 10 pairs by collaboration frequency versus average rating, and a bar chart of the top 50 Actor-Director pairs by average rating.

We strongly believe that these changes have improved the clarity and relevance of our EDA, aligning it better with our research objectives. Thank you :)

COGS108 / Group080_WI24