The goal of the project is to see if there is a way to predict who will win a soccer match.
I like the choices of linear regression and decision trees. Intuitively, linear models make some sense since the team strength is the sum of the strength of its individual players (plus some unknown factor we can term teamwork or team spirit). Similarly decision trees make sense since a cutoff in performance values can define the border between a team being good, and a team being unstoppable (for example).
Here are some of my concerns and suggestions:
Your report seemed to imply that random guessing at equal percentages yielded 1% accuracy. To me this seems remarkably low, so it would lend some clarity if you specified the accuracy of random guess before stating the improvement.
Furthermore, for a baseline test I think it may be a better baseline to simply guess the most occurring result i.e "home team wins." I think this baseline lends itself to be much more accurate than simply random guessing and would provide a better comparison to your models.
For both choices of models, the ordering of each column is important. I'm not too sure if you guys did this or not, but I think for consistency you should assign positions to each column e.g columns h_0 and a_0 are both goalies, h_1 to h_5 and a_1 to a_5 are forwards, etc.
When talking about over and underfitting, I think using metrics would be better than simply saying "we think we have enough data." E.g show that using a tree depth of 6 yields better results than a depth of 7 or 8, etc.
Good work so far! I am excited to see where you guys take this work.
The goal of the project is to see if there is a way to predict who will win a soccer match.
I like the choices of linear regression and decision trees. Intuitively, linear models make some sense since the team strength is the sum of the strength of its individual players (plus some unknown factor we can term teamwork or team spirit). Similarly decision trees make sense since a cutoff in performance values can define the border between a team being good, and a team being unstoppable (for example).
Here are some of my concerns and suggestions:
Good work so far! I am excited to see where you guys take this work.