In this project the team attempted to build a model that could predict the outcome of soccer matches. This seems like an interesting topic and, given the presence of soccer match betting, potentially a profitable one. However, I have some problems with the work that was actually done.
No effort seems to have been made to account for time-varying behavior on the part of the teams. It seems obvious that over time as a team's lineup, experience, coach, etc changes, that team's abilities will change also. And yet the features selected seem to largely assume that a given team's prowess is static and unchanging over time
No explanation is given for how FIFA ratings are created or what they encompass. At best if this feature turns out to be highly predictive then your model is essentially piggybacking off someone else's. At worst if they turn out to be useless then you're contaminating your real data with a lot of subjective judgement.
Betting odds are already the market's judgement of how the game will play out. Again, you're piggybacking your model off the output of someone else's model, in this case the aggregate model of the soccer community. These are not a real feature of the game in question, they're just a preexisting prediction for how it will turn out.
It's really not clear why you chose to combine the players' ratings by taking the kth root of their product. Is there any particular reason why this is the right metric? I'd have liked to see some justification for this, and possibly an exploration of how different choices affect model quality.
After all of this the result is a model with 56% accuracy. Since you're using them as a feature anyway, what I'd really like to see is a comparison to just using the betting odds straight up. How well does your model do relative to the market? If your model does worse than just assuming that the betting odds are always right then we know you're doing something wrong. If on the other hand you can outperform the market then congratulations! You stand to make some money.
In this project the team attempted to build a model that could predict the outcome of soccer matches. This seems like an interesting topic and, given the presence of soccer match betting, potentially a profitable one. However, I have some problems with the work that was actually done.
After all of this the result is a model with 56% accuracy. Since you're using them as a feature anyway, what I'd really like to see is a comparison to just using the betting odds straight up. How well does your model do relative to the market? If your model does worse than just assuming that the betting odds are always right then we know you're doing something wrong. If on the other hand you can outperform the market then congratulations! You stand to make some money.