Closed hedonistrh closed 2 years ago
One possible explanation is -- when I check model's parameter after fitting with that 4 weeks, Rho is -10105887.801111476. So that effect tau calculation significantly. 🤔 But still not sure about what is reason of that.
I think the issue is that some of the teams have inconsistent naming, leaving orphan teams:
import collections
collections.Counter(
[m['away_team_name'] for m in previous_matches] +
[m['home_team_name'] for m in previous_matches]
)
Counter({'Bayern München': 4,
'SC Freiburg': 3,
'TSG Hoffenheim': 4,
'Bayer Leverkusen': 4,
'Greuther Fürth': 4,
'VfL Bochum': 3,
'Eintracht Frankfurt': 4,
'RB Leipzig': 4,
'Hertha BSC': 4,
'VfB Stuttgart': 4,
'Mainz 05': 4,
'FC Augsburg': 3,
'Borussia Dortmund': 4,
'Arminia Bielefeld': 4,
'Wolfsburg': 4,
"Borussia M'Gladbach": 4,
'Union Berlin': 4,
'1. FC Köln': 4,
'Bochum': 1,
'Freiburg': 1,
'Augsburg': 1})
I think it should work okay with consistent team names over this dataset
This does raise the question of how the model should handle non-identifiable datasets, since failing silently like this is not helpful - do you have any ideas?
Thanks @Torvaney. I was using combination of two data-set and that explain why those names are inconsistent. Thanks a lot for spotting that issue. That is also helpful for other parts of my project. 💯
On the other hand, this did not solve that "nan" issue. I am sharing reproducible code with the consistent name
import mezzala
adapter = mezzala.KeyAdapter(
home_team='home_team_name',
away_team='away_team_name',
home_goals='home_score',
away_goals='away_score',
)
# following is first 4 week of Bundesliga 2021-2022
previous_matches = [{'home_team_name': 'B. Monchengladbach', 'away_team_name': 'Bayern Munich', 'home_score': 1, 'away_score': 1}, {'home_team_name': 'Arminia Bielefeld', 'away_team_name': 'Freiburg', 'home_score': 0, 'away_score': 0}, {'home_team_name': 'Augsburg', 'away_team_name': 'Hoffenheim', 'home_score': 0, 'away_score': 4}, {'home_team_name': 'Union Berlin', 'away_team_name': 'Bayer Leverkusen', 'home_score': 1, 'away_score': 1}, {'home_team_name': 'Stuttgart', 'away_team_name': 'Greuther Furth', 'home_score': 5, 'away_score': 1}, {'home_team_name': 'Wolfsburg', 'away_team_name': 'Bochum', 'home_score': 1, 'away_score': 0}, {'home_team_name': 'Dortmund', 'away_team_name': 'Eintracht Frankfurt', 'home_score': 5, 'away_score': 2}, {'home_team_name': 'Mainz', 'away_team_name': 'RB Leipzig', 'home_score': 1, 'away_score': 0}, {'home_team_name': 'FC Koln', 'away_team_name': 'Hertha Berlin', 'home_score': 3, 'away_score': 1}, {'home_team_name': 'RB Leipzig', 'away_team_name': 'Stuttgart', 'home_score': 4, 'away_score': 0}, {'home_team_name': 'Bochum', 'away_team_name': 'Mainz', 'home_score': 2, 'away_score': 0}, {'home_team_name': 'Eintracht Frankfurt', 'away_team_name': 'Augsburg', 'home_score': 0, 'away_score': 0}, {'home_team_name': 'Freiburg', 'away_team_name': 'Dortmund', 'home_score': 2, 'away_score': 1}, {'home_team_name': 'Greuther Furth', 'away_team_name': 'Arminia Bielefeld', 'home_score': 1, 'away_score': 1}, {'home_team_name': 'Hertha Berlin', 'away_team_name': 'Wolfsburg', 'home_score': 1, 'away_score': 2}, {'home_team_name': 'Bayer Leverkusen', 'away_team_name': 'B. Monchengladbach', 'home_score': 4, 'away_score': 0}, {'home_team_name': 'Hoffenheim', 'away_team_name': 'Union Berlin', 'home_score': 2, 'away_score': 2}, {'home_team_name': 'Bayern Munich', 'away_team_name': 'FC Koln', 'home_score': 3, 'away_score': 2}, {'home_team_name': 'Dortmund', 'away_team_name': 'Hoffenheim', 'home_score': 3, 'away_score': 2}, {'home_team_name': 'Arminia Bielefeld', 'away_team_name': 'Eintracht Frankfurt', 'home_score': 1, 'away_score': 1}, {'home_team_name': 'Augsburg', 'away_team_name': 'Bayer Leverkusen', 'home_score': 1, 'away_score': 4}, {'home_team_name': 'FC Koln', 'away_team_name': 'Bochum', 'home_score': 2, 'away_score': 1}, {'home_team_name': 'Mainz', 'away_team_name': 'Greuther Furth', 'home_score': 3, 'away_score': 0}, {'home_team_name': 'Stuttgart', 'away_team_name': 'Freiburg', 'home_score': 2, 'away_score': 3}, {'home_team_name': 'Bayern Munich', 'away_team_name': 'Hertha Berlin', 'home_score': 5, 'away_score': 0}, {'home_team_name': 'Union Berlin', 'away_team_name': 'B. Monchengladbach', 'home_score': 2, 'away_score': 1}, {'home_team_name': 'Wolfsburg', 'away_team_name': 'RB Leipzig', 'home_score': 1, 'away_score': 0}, {'home_team_name': 'Bayer Leverkusen', 'away_team_name': 'Dortmund', 'home_score': 3, 'away_score': 4}, {'home_team_name': 'Freiburg', 'away_team_name': 'FC Koln', 'home_score': 1, 'away_score': 1}, {'home_team_name': 'Greuther Furth', 'away_team_name': 'Wolfsburg', 'home_score': 0, 'away_score': 2}, {'home_team_name': 'Hoffenheim', 'away_team_name': 'Mainz', 'home_score': 0, 'away_score': 2}, {'home_team_name': 'Union Berlin', 'away_team_name': 'Augsburg', 'home_score': 0, 'away_score': 0}, {'home_team_name': 'RB Leipzig', 'away_team_name': 'Bayern Munich', 'home_score': 1, 'away_score': 4}, {'home_team_name': 'Eintracht Frankfurt', 'away_team_name': 'Stuttgart', 'home_score': 1, 'away_score': 1}, {'home_team_name': 'Bochum', 'away_team_name': 'Hertha Berlin', 'home_score': 1, 'away_score': 3}, {'home_team_name': 'B. Monchengladbach', 'away_team_name': 'Arminia Bielefeld', 'home_score': 3, 'away_score': 1}]
model = mezzala.DixonColes(adapter=adapter)
model.fit(previous_matches)
match_to_predict = {'home_team_name': 'Wolfsburg', 'away_team_name': 'Eintracht Frankfurt'}
scorelines = model.predict_one(match_to_predict, 6)
print (scorelines)
We can still see following "nan" one
ScorelinePrediction(home_goals=0, away_goals=1, probability=nan)
Also sharing counter for that data
Counter({'Bayern Munich': 4,
'Freiburg': 4,
'Hoffenheim': 4,
'Bayer Leverkusen': 4,
'Greuther Furth': 4,
'Bochum': 4,
'Eintracht Frankfurt': 4,
'RB Leipzig': 4,
'Hertha Berlin': 4,
'Stuttgart': 4,
'Mainz': 4,
'Augsburg': 4,
'Dortmund': 4,
'Arminia Bielefeld': 4,
'Wolfsburg': 4,
'B. Monchengladbach': 4,
'Union Berlin': 4,
'FC Koln': 4})
Ah yes, thanks for the clarification.
I think the core issue is that the model is actually underspecified. There is a constraint on Rho that isn't implemented (usually the optimisation proceeds fine without it) that can result in invalid probability estimates. In this case, the fact that Wolfsburg have only conceded 1 goal leads to they defence parameter being extremely low (about 0.00000004). At this point the Rho-adjustment is larger than the estimated probability of observing a 0-1 scoreline. This takes the probability negative, which is impossible, thus resulting in a nan
probability.
As a workaround, I think the easiest way to amend the issue for now would be to manually reset Rho after the fact, perhaps to a value fit over a larger sample.
model.params[mezzala.RHO_KEY] = some_reasonable_value
Pretty hacky, I know.
Thanks a lot for your answer. I did what you mentioned as hacky and it is working right now. Also as you suggested, when we use more data, we do not end up with "nan" probability as well. 💯 I am just sharing latest code about as hacky solution .
import mezzala
adapter = mezzala.KeyAdapter(
home_team='home_team_name',
away_team='away_team_name',
home_goals='home_score',
away_goals='away_score',
)
# following is first 4 week of Bundesliga 2021-2022
previous_matches = [{'home_team_name': 'B. Monchengladbach', 'away_team_name': 'Bayern Munich', 'home_score': 1, 'away_score': 1}, {'home_team_name': 'Arminia Bielefeld', 'away_team_name': 'Freiburg', 'home_score': 0, 'away_score': 0}, {'home_team_name': 'Augsburg', 'away_team_name': 'Hoffenheim', 'home_score': 0, 'away_score': 4}, {'home_team_name': 'Union Berlin', 'away_team_name': 'Bayer Leverkusen', 'home_score': 1, 'away_score': 1}, {'home_team_name': 'Stuttgart', 'away_team_name': 'Greuther Furth', 'home_score': 5, 'away_score': 1}, {'home_team_name': 'Wolfsburg', 'away_team_name': 'Bochum', 'home_score': 1, 'away_score': 0}, {'home_team_name': 'Dortmund', 'away_team_name': 'Eintracht Frankfurt', 'home_score': 5, 'away_score': 2}, {'home_team_name': 'Mainz', 'away_team_name': 'RB Leipzig', 'home_score': 1, 'away_score': 0}, {'home_team_name': 'FC Koln', 'away_team_name': 'Hertha Berlin', 'home_score': 3, 'away_score': 1}, {'home_team_name': 'RB Leipzig', 'away_team_name': 'Stuttgart', 'home_score': 4, 'away_score': 0}, {'home_team_name': 'Bochum', 'away_team_name': 'Mainz', 'home_score': 2, 'away_score': 0}, {'home_team_name': 'Eintracht Frankfurt', 'away_team_name': 'Augsburg', 'home_score': 0, 'away_score': 0}, {'home_team_name': 'Freiburg', 'away_team_name': 'Dortmund', 'home_score': 2, 'away_score': 1}, {'home_team_name': 'Greuther Furth', 'away_team_name': 'Arminia Bielefeld', 'home_score': 1, 'away_score': 1}, {'home_team_name': 'Hertha Berlin', 'away_team_name': 'Wolfsburg', 'home_score': 1, 'away_score': 2}, {'home_team_name': 'Bayer Leverkusen', 'away_team_name': 'B. Monchengladbach', 'home_score': 4, 'away_score': 0}, {'home_team_name': 'Hoffenheim', 'away_team_name': 'Union Berlin', 'home_score': 2, 'away_score': 2}, {'home_team_name': 'Bayern Munich', 'away_team_name': 'FC Koln', 'home_score': 3, 'away_score': 2}, {'home_team_name': 'Dortmund', 'away_team_name': 'Hoffenheim', 'home_score': 3, 'away_score': 2}, {'home_team_name': 'Arminia Bielefeld', 'away_team_name': 'Eintracht Frankfurt', 'home_score': 1, 'away_score': 1}, {'home_team_name': 'Augsburg', 'away_team_name': 'Bayer Leverkusen', 'home_score': 1, 'away_score': 4}, {'home_team_name': 'FC Koln', 'away_team_name': 'Bochum', 'home_score': 2, 'away_score': 1}, {'home_team_name': 'Mainz', 'away_team_name': 'Greuther Furth', 'home_score': 3, 'away_score': 0}, {'home_team_name': 'Stuttgart', 'away_team_name': 'Freiburg', 'home_score': 2, 'away_score': 3}, {'home_team_name': 'Bayern Munich', 'away_team_name': 'Hertha Berlin', 'home_score': 5, 'away_score': 0}, {'home_team_name': 'Union Berlin', 'away_team_name': 'B. Monchengladbach', 'home_score': 2, 'away_score': 1}, {'home_team_name': 'Wolfsburg', 'away_team_name': 'RB Leipzig', 'home_score': 1, 'away_score': 0}, {'home_team_name': 'Bayer Leverkusen', 'away_team_name': 'Dortmund', 'home_score': 3, 'away_score': 4}, {'home_team_name': 'Freiburg', 'away_team_name': 'FC Koln', 'home_score': 1, 'away_score': 1}, {'home_team_name': 'Greuther Furth', 'away_team_name': 'Wolfsburg', 'home_score': 0, 'away_score': 2}, {'home_team_name': 'Hoffenheim', 'away_team_name': 'Mainz', 'home_score': 0, 'away_score': 2}, {'home_team_name': 'Union Berlin', 'away_team_name': 'Augsburg', 'home_score': 0, 'away_score': 0}, {'home_team_name': 'RB Leipzig', 'away_team_name': 'Bayern Munich', 'home_score': 1, 'away_score': 4}, {'home_team_name': 'Eintracht Frankfurt', 'away_team_name': 'Stuttgart', 'home_score': 1, 'away_score': 1}, {'home_team_name': 'Bochum', 'away_team_name': 'Hertha Berlin', 'home_score': 1, 'away_score': 3}, {'home_team_name': 'B. Monchengladbach', 'away_team_name': 'Arminia Bielefeld', 'home_score': 3, 'away_score': 1}]
model = mezzala.DixonColes(adapter=adapter)
model.fit(previous_matches)
model.params[mezzala.RHO_KEY] = 0.25
match_to_predict = {'home_team_name': 'Wolfsburg', 'away_team_name': 'Eintracht Frankfurt'}
scorelines = model.predict_one(match_to_predict, 6)
print (scorelines)
Ps. I really liked your Statsbomb conference. Thanks for preparing that and putting online. 🙏🏼
Thanks, @hedonistrh! I'm going to close this issue for now. I have raised a new one (#3) for the missing constraint.
Hey @Torvaney, thanks a lot for great repo. I was using dixon-coles model for some calculations but realized that for some cases we can end up with 'nan' in some probabilities. I did try to some debugging but was not able to find what is the reason of that. I am sharing reproducible code as following
When we check scorelines, we can see following