Proposal peer review - Githubissues

The following is the peer review of the project proposal by [name of team completing peer review]. The team members that participated in this review are

Remi Hendershott - @remisublette
Mohammad Farmani - @mfarmani95
Praveen Kumar Pappala - @Praveen-Kumar-Pappala
Deema Albluwi -@Dee-koudz
Priom Mahmud- @Priom1996
Omid Zandi -@zandi-omid
Describe the goal of the project. The goal of the project is to examine the correlation between in-game metrics such as shots on goal, fouls, and cards, and the outcomes of soccer matches in the English Premier League during the 2021–2022 season. It seeks to understand the dynamics of football performance and develop predictive models for match results based on these statistics.
Describe the data used or collected. The data utilized in this project includes comprehensive match-day statistics for the 2021–2022 English Premier League season. Sourced from Evan Gower's work on Kaggle, it captures detailed information from 380 matches, including team names, match dates, in-game actions like goals, shots, fouls, and cards, as well as halftime and full-time results. This dataset serves as a foundation for analyzing the factors that influence team performance and match outcomes.
Describe how the research question will be answered, e.g. what approaches/methods will be used. For their first research question, they will be using regression techniques, including logistic or decision trees to show the correlation between in-game performance and match outcomes. For their second research question, they aim to show a computational comparison of simulated scenarios with the actual data, based on half-time scores compared to full-time results to gain insight into current league team rankings, seeing if the team's second-half performance is significant to their success.
Is there anything that is unclear from the proposal? The approach for their second research question is unclear, they mention that they will be computing but not how, they mention doing a recalibration but do not go into detail on what methods they will use to do so. Their concept for question 2 is clear, but the methods to answer that question are unclear.
Provide constructive feedback on how the team might be able to improve their project. They could provide more detail on the methods/approach they will use to answer their question 2. They could benefit from defining their lowest and highest variables. They could also consider stating if there are any missing gaps in their data, and if so how many.
What aspect of this project are you most interested in and would like to see highlighted in the presentation? We are interested to see the results of their research and if their predictive models are accurate or not at predicting match outcomes.
Provide constructive feedback on any issues with file and/or code organization. The print of the first data frame is redundant because all the info on the data set is provided in the following full preview of their data. It would be worth considering that that printed data frame out. They could also suppress the packets at the beginning, as they called NumPy and Seaborn but did not use them, just as it is an extra part that isn't needed.
(Optional) Any further comments or feedback? Overall, their idea is good and they have a good introduction and data set description, with a lot of clear descriptions of the variables being used and background info for context. The analysis plan for question 1 is also good, we are looking forward to seeing their end results.

INFO523-S24 / project-01-TAAAG-team

Proposal peer review #1