SPADL Definition - Githubissues

JeroenClijmans1 commented 3 years ago

For my thesis I'm defining a sequence of ball possession of a team as a specific sequence of SPADL actions that occur in a larger sequence of SPADL actions (the precise definition is not important for this issue). For this, I'm basing myself on Table 3.1 in Tom Decroos PhD thesis (https://tomdecroos.github.io/reports/thesis_tomdecroos.pdf). This Table defines all SPADL actions and which attribute values each action can have. However, this definition does not seem up to date with the actual implementation in socceraction. I encountered the following differences:

The goalkick action is not described in the definition of SPADL in the thesis
The definition of SPADL in the thesis describes a keeper pick-up action. However the Statsbomb to SPADL converter will never convert an action to this type after inspecting the code.
The definition of SPADL in the thesis states that fouls will always have 'success' as a result, while the converter will always give 'fail' as the result of this action
The definition of SPADL in the thesis states that interceptions will always have 'success' as a result, while the converter will attribute 'success' to some interceptions (in case they succeed) and 'fail' to other interceptions (in case the Statsbomb data say that the ball was intercepted but knocked to opposition or the ball was intercepted but went out of bounds by doing so)
The definition of SPADL in the thesis states that keeper saves will always have 'success' as a result, but the Statsbomb action-attribute pair 'Shot Saved (In Play Danger)' would lead to a conversion to a failed keeper save action. The same goes for the Statsbomb action-attribute pair 'Punch (In Play Danger)', although I'm not sure whether that combination can actually occur in the data.

A precise definition of the SPADL data format is necessary to correctly define a sequence of ball possession in terms of SPADL actions. It's important to state more tricky things, eg. that a failed interception of another team does not impact a ball possession sequence of one team. The possible occurrence of failed interceptions was for example denied in the thesis.

I therefore propose that it would maybe be a good idea to have an up-to-date definition somewhere which precisely defines SPADL and what action-attribute pairs are valid. This allows to build definitions in terms of SPADL actions or SPADL action sequences.

When building this definition, I also think that the following things in the original definition in the thesis should require some attention:

According to the original definition, all corner actions can have offside as a special result. This cannot occur in practice.
According to the original definition, tackles can have a yellow or red card as a special result. However, I think that in this case it should be classified as a foul. Maybe the converter should be build in such a way that it converts a failed tackle with a card as result (from the original data of 3rd parties) to 2 actions in SPADL, in which the first is a failed tackle and the second is a foul with the card as a result.
According to the original definition, penalty shots and free kick shots can have an owngoal as a special result. However, unless some truly mafioso things are going on, this can never be the case in practice. However, in football you never know of course... :)

probberechts commented 3 years ago

Thanks for your detailed feedback, Jeroen. It seems indeed that the table in Tom's PhD thesis and the VAEP paper does not entirely agree with the implementation of SPADL. I agree with your suggestion to add a precise and up-to-date definition of SPADL to the repo.

I created a separate issues to address the problems with the keeper action types (#37). In my opinion, the other points that you mention are errors in the table.

The goalkick should be a separate action type.
There should be a difference between interceptions that regain possession and interceptions that only break the attack.
Correct me if I am wrong, but I think that tackles that result in a foul are already converted to two events in which the first is a failed tackle and the second is a foul (with optionally the card as a result of the foul).

Finally, to come back to your specific problem, I do not think that SPADL actions are ideal for defining ball possession sequences. New possessions should be triggered after a team demonstrate they've established control of the ball. This information is not included in the SPADL actions. If you use StatsBomb data, it might be more accurate to use their labeling of possession in the events. You can then map this back to the SPADL actions (feel free to tackle #7 to make this possible 😉).

JeroenClijmans1 commented 3 years ago

I see about the tackling result. I did not look thoroughly enough into the code for to see how this tackle/card result was actually converted, I just looked at the table in the thesis.

The way I'm defining ball possession sequences at the moment is as a consecutive series of SPADL actions of one team, involving at least three ball moving actions (pass, cross, dribble, shot, goalkick) of this team and in which no (successful) actions of the other team occur to break this possession. The requirement to have at least three ball moving actions is there to capture some kind of notion that they should have established control of the ball. Note that for my particular problem, I'm not interested in what actions actually occur in this possession. I only use some kind of definition for 'possession sequence' to check where this spell of possession started, i.e. where and how (set piece, throwin, keeperball or open play) do teams regain possession.

ML-KULeuven / socceraction

SPADL Definition #35