Midterm Review (sz244) - Githubissues

The goal of the project is to predict the outcome of soccer match using data including the player rating and betting information.

Here are a few thinks I like about the current method:

I like that you've ran linear regression so that you know what the baseline should be for the problem. Now you have a good estimate of the difficulty of the problem.
I like that you're already think about over-underfitting issue.

Here are a few suggestions I can think of

Instead of using discrete values {-1,1} to represent betting odd, maybe consider using the actual value because a 0.51 vs 0.49 is very different from 0.9 vs 0.1
You may consider including the weather information on the day of the match. Such as the temperature and whether it's windy/sunny or not. That is also an important factor in soccer games. Also including the country of origin/continent of origin might be helpful since people from different region tend to have different tolerance of heat/cold.
It's interesting that the coefficient across different player is quite different. Maybe you should take a few sub-sample, run the linear regression again and see who they vary.

ben1605 / Soccer-match-outcome-prediction