ML-KULeuven / socceraction

Convert soccer event stream data to SPADL and value player actions using VAEP or xT
MIT License
622 stars 139 forks source link

SPADL Definition #35

Closed JeroenClijmans1 closed 3 years ago

JeroenClijmans1 commented 3 years ago

For my thesis I'm defining a sequence of ball possession of a team as a specific sequence of SPADL actions that occur in a larger sequence of SPADL actions (the precise definition is not important for this issue). For this, I'm basing myself on Table 3.1 in Tom Decroos PhD thesis (https://tomdecroos.github.io/reports/thesis_tomdecroos.pdf). This Table defines all SPADL actions and which attribute values each action can have. However, this definition does not seem up to date with the actual implementation in socceraction. I encountered the following differences:

A precise definition of the SPADL data format is necessary to correctly define a sequence of ball possession in terms of SPADL actions. It's important to state more tricky things, eg. that a failed interception of another team does not impact a ball possession sequence of one team. The possible occurrence of failed interceptions was for example denied in the thesis.

I therefore propose that it would maybe be a good idea to have an up-to-date definition somewhere which precisely defines SPADL and what action-attribute pairs are valid. This allows to build definitions in terms of SPADL actions or SPADL action sequences.

When building this definition, I also think that the following things in the original definition in the thesis should require some attention:

probberechts commented 3 years ago

Thanks for your detailed feedback, Jeroen. It seems indeed that the table in Tom's PhD thesis and the VAEP paper does not entirely agree with the implementation of SPADL. I agree with your suggestion to add a precise and up-to-date definition of SPADL to the repo.

I created a separate issues to address the problems with the keeper action types (#37). In my opinion, the other points that you mention are errors in the table.

Finally, to come back to your specific problem, I do not think that SPADL actions are ideal for defining ball possession sequences. New possessions should be triggered after a team demonstrate they've established control of the ball. This information is not included in the SPADL actions. If you use StatsBomb data, it might be more accurate to use their labeling of possession in the events. You can then map this back to the SPADL actions (feel free to tackle #7 to make this possible 😉).

JeroenClijmans1 commented 3 years ago

I see about the tackling result. I did not look thoroughly enough into the code for to see how this tackle/card result was actually converted, I just looked at the table in the thesis.

The way I'm defining ball possession sequences at the moment is as a consecutive series of SPADL actions of one team, involving at least three ball moving actions (pass, cross, dribble, shot, goalkick) of this team and in which no (successful) actions of the other team occur to break this possession. The requirement to have at least three ball moving actions is there to capture some kind of notion that they should have established control of the ball. Note that for my particular problem, I'm not interested in what actions actually occur in this possession. I only use some kind of definition for 'possession sequence' to check where this spell of possession started, i.e. where and how (set piece, throwin, keeperball or open play) do teams regain possession.