PySport / kloppy

kloppy: standardizing soccer tracking- and event data
https://kloppy.pysport.org
BSD 3-Clause "New" or "Revised" License
372 stars 61 forks source link

A few issues and discussion points wrt #242 #257

Open probberechts opened 11 months ago

probberechts commented 11 months ago

Although a bit late, I see a few issues wrt the recently merged PR #242 by @DriesDeprest.

First, the PR incorporates the Opta "Challenge" event in the kloppy DuelEvent. This changes the definition of a DuelEvent that we agreed upon in #135. Previously, duels corresponded to events that require an intervention. Instead, the main use of Opta's "Challenge" event is to describe the player who gets dribbled past when a dribbler takes them on. It means that the player who gets dribbled past either did nothing at all or was not able to touch the ball. Otherwise, the event would have been labeled as a "Tackle". Therefore, the definition of a DuelEvent in the Opta serializer is no longer consistent with the definition in the StatsBomb and Wyscout serializer.

I am not per se against adding the "Challenge" event, but then the StatsBomb and Wyscout serializer should be adapted accordingly and there should be a distinction between -- in Opta terminology -- a tackle and a challenge. A tackle is an intervention, while a challenge is an opportunity to tackle. To draw a parallel, giving the same label to a tackle and a challenge would be like labeling a big chance as a shot. This also sabotages my effort to integrate Kloppy and socceraction because Challenges are not seen as actions in SPADL.

Second, the PR introduces a DuelType.Tackle qualifier, which is equivalent to DuelType.GROUND + ~DuelType.LOOSE_BALL. Adding this was previously suggested by @MKlaasman in https://github.com/PySport/kloppy/issues/135#issuecomment-1570161586. Although I don't like redundant qualifiers, I am not strongly against it but it should be added to the StatsBomb and Wyscout parsers too and be documented to avoid confusion regarding the difference between a ground duel and a tackle.

JanVanHaaren commented 11 months ago

Thank you for raising the issues and starting the discussion, @probberechts. I must admit that I didn't think much about the potential implications while reviewing #242. I also completely forgot about the discussion in #135.

As I recently mentioned in #240, it would be helpful if we had formal definitions for each of the kloppy event types and qualifiers.

DriesDeprest commented 11 months ago

Thanks for your input on this, @probberechts.

Regarding 1/ the difference between Challenge & Tackle: To make sure I understand - are you saying that Opta's definition of a Challenge ("player unsuccessfully attempts to tackle an opponent as the opponent dribbles past them") does not mean an unsuccessful tackle but rather an unsuccessful attempt to even make the tackle? If so, I guess I misinterpreted the definition and agree that the current implementation is undesirable.

I think we should add a DribbledPast event, which would be identified as follows for

Regarding 2/ whether we want a Tackle qualifier or not. I forgot about the discussion in https://github.com/PySport/kloppy/issues/135. Moving forward, I don't have a strong opinion on whether we have an explicit Tackle qualifier or whether it is up to the user to recognize tackle events as Ground DuelEvents that are not LooseBall. Thus, happy to follow what is decided.

@koenvo do you have an opinion on this?

probberechts commented 11 months ago

Yes, exactly. The Opta "Challenge" event agrees for ~90% with the StatsBomb's "Dribbled Past" event.

You can find some examples of "Challenge" events in BEL - POR at Euro2020 at

There are 27 Opta "Challenge" and 22 StatsBomb "Dribbled Past" events in this game.

I think we should add a DribbledPast event

I like how StatsBomb defines a duel: "Duel events describe when a defender challenges an attacker in some way". In a "Dribbled Past"/"Challenge" event, there is always some degree of contact / pressing between the player that dribbles and the one that gets dribbled past. Hence, intuitively, it is some kind of duel and could thus be incorporated in the DuelEvent.

One option would be to define a "Dribbled Past" event as DuelEvent with DuelType.GROUND qualifier and DuelResult.LOST outcome. A StatsBomb "Duel" with type "Tackle" and an Opta "Tackle" event would be mapped to a DuelEvent with DuelType.GROUND and DuelType.TACKLE qualifiers with DuelResult.WON or DuelResult.LOST outcome (depending on whether possession is regained).

I like this proposal because:

DriesDeprest commented 11 months ago

Thanks for sharing, I like your proposal.

So would you agree with the following next actions:

probberechts commented 11 months ago

I only have a minor remark regarding

Wyscout: Recognize Dribble Past event as DuelEvents with DuelType.GROUND qualifier and DuelResult.LOST / WIN outcome depending on provider event success

The Wyscout docs give the following examples of a won dribble past attempt:

  1. defending player dispossesses the attacker
  2. defending player kicks the ball out
  3. the attacker stays with the ball, but the defender forces him to go back

According to the current implementation of the StatsBomb deserializer, a team has to regain possession after a duel for it to be considered successful. Hence, only the first one would yield a DuelResult.WON outcome.

I am not sure what the best solution would be here. You could certainly argue that the second and third examples are—albeit to a lesser degree—also successful.

DriesDeprest commented 11 months ago

Okay, I understand. For Wyscout v3 I would then apply the logic shown in the screenshot to determine the DuelResult

image

The stoppedProgress and recoveredPossession can be read from Wyscout v3's raw data.

Do you agree with this approach?

DriesDeprest commented 10 months ago

@probberechts I've created an overview of how different duel events of different providers, currently, are parsed by kloppy and two suggestions on how to change that to properly capture the dribbled past events.

I've added an explicit and implicit suggestion on how to adjust the kloppy duel type definitions to be able to support dribbled past events. In the explicit version, we would add a DribbledPast duel type to explicitly label dribbled past events. In the implicit version, a dribbled past event could be recognized as a duel event with a qualifier Ground and no qualifier Tackle or LooseBall.

I've done this exercise for Opta, StatsBomb & Wyscout v3. I'm not planning in the near future to adjust the Wyscout v3 parsing deserialization logic, but wanted to already do the thought exercise to make sure our decision is future-proof in case we will update the Wyscout v3 deserializer.

Do you like any of the two proposals? Which has your preference? Or am I still missing something in my suggestion?

probberechts commented 10 months ago

Thanks @DriesDeprest.

I am happy with the assignment of the DuelType qualifiers and I don't have a strong preference for the explicit or implicit DuelType qualifiers.

Determining the DuelOutcome is more challenging. I guess the first question is whether we stick to the existing outcomes (WON, LOST, NEUTRAL) or whether we add additional gradations of being successful. Looking at your analysis, I think the following criteria are used for a duel to be successful or unsuccessful by the data providers:

One idea would be to work with qualifiers (i.e., a DuelOutcome qualifier) for each of these criteria. It should be possible to derive each of them from the data. Then you can derive a default (WON, LOST, NEUTRAL) outcome by combing qualifiers and users can modify this definition if they do not agree. It will require quite a lot of work to implement this though.

The alternative would be to mostly rely on the provider's definitions as in your propososal. Trying to summarize this and stating what's still unclear:

DriesDeprest commented 10 months ago

Thanks for reviewing and sharing your insights @probberechts.

On the explicit vs implicit DuelType qualifiers, my preference would go the explicit suggestion.

Regarding the DuelResult, I agree that your solution of adding the listed qualifiers that would allow calculating the DuelResult will result in more predictable and standardized behaviour across data providers. However, I don't feel comfortable committing to develop this logic, as it indeed seems like quite a lot of work.

Therefore, I would suggest that in the short run I refine our current implementation by also recognizing dribbled past events and for now use the providers' outcome labels to determine our result. Thus, I'll follow the logic which we'll agree upon here.

@JanVanHaaren @koenvo any thoughts on this? I'd like to start implementing this, but want to make sure you guys agree with the plan.