Torvaney / regista

An R package for soccer modelling
https://torvaney.github.io/regista/
GNU General Public License v3.0
85 stars 8 forks source link

Can't create table of scoreline probabilities without dixoncoles class object #31

Closed hrmantovani closed 4 years ago

hrmantovani commented 4 years ago

I'm a newbie at R, been learning for only 1 week now. augment.dixoncoles() creates a huge table of scoreline probabilities, which I can use then with the sample() to generate a result for the game. Is there a way to create this table having only the expected goals for each team? For example: in the 2018 World Cup, France was expected to score 2.644 and concede 0.681 against Australia. I have these numbers for all the WC games as a test. Is there a way to transform this data so I can use it in the way I said before? Am I missing something? Sorry to send this here, don't want to bother you on Twitter.

Torvaney commented 4 years ago

Hi there,

Under the hood, augment uses a Dixon-Coles model to make predictions for a given match, based on the model's parameters (i.e. team strength estimates). These predictions are quite different from the common practice in soccer analytics of resimulating a match based on the team's shots (and the xG of each of those shots) in that match.

As I understand it, you are looking to do the latter (generate probabilities from shots of a match that has already taken place). The easiest way that I have found to do this is to use the poisbinom package as shown in the second code chunk here: http://www.statsandsnakeoil.com/2018/06/22/dixon-coles-and-xg-together-at-last/

However, this method requires you to have the individual shot data for the match in question (i.e. the xG for each shot, not just the aggregated value).

Does this answer your question? Let me know if I have misunderstood you

hrmantovani commented 4 years ago

It doesn't answer 100%, but it makes me believe that your R package has the answer. I analyzed international football games since the beginning and came up with a formula that predicts expected goals based on both teams Elo ratings. I could then use a simple poisson to predict goals, but the Dixon-Coles' tau apparently leads to more realistic scores due to the correction it makes to the low-scoring games. The question is: having the expected goals from both teams in a given game (already considering home factor and all that) is it possible to build the same dataframe as the augment function does?

Torvaney commented 4 years ago

Hey - did you fix your issue?

I'm not 100% sure what exactly you are trying to do. Would you be able to post an example of your desired inputs and outputs?