PySport / kloppy

kloppy: standardizing soccer tracking- and event data
https://kloppy.pysport.org
BSD 3-Clause "New" or "Revised" License
362 stars 59 forks source link

PassType.ASSIST is being used both as a shot and goal assist interchangeably #239

Closed DriesDeprest closed 6 months ago

DriesDeprest commented 10 months ago

In our Opta deserializer, we add the PassType.ASSIST qualifier to a pass with an Opta qualifier type 210, see: https://github.com/PySport/kloppy/blob/master/kloppy/infra/serializers/event/opta/deserializer.py#L543-L544 qualifier type 210 means a shot assist, as you can see below:

image

In our Statsbomb deserializer, we add the PassType.ASSIST qualifier to a pass with a "goal_assist" tag, see: https://github.com/PySport/kloppy/blob/master/kloppy/infra/serializers/event/statsbomb/deserializer.py#L291-L293

I would suggest moving from PassType.ASSIST to a PassType.SHOT_ASSISTand PassType.GOAL_ASSIST. A goal assist would have both qualifiers. This way we are explicit about what we mean and we are capable of handling both shot and goal assists.

@koenvo agree?

koenvo commented 10 months ago

Correct me if I'm wrong but this information doesn't need to be loaded from the data itself but can be derived?

pseudo code:

if event.event_type == EventType.PASS:
  next_event = event.next()
  if next_event.event_type == EventType.SHOT:
    event.qualifiers.append(
        PassQualifier(
            PassType.ASSIST_GOAL if next_event.result == ShotResult.GOAL else PassType.ASSIST_SHOT
        )
    )

This way it works for all vendors. Should this work?

Curious what you think about this @JanVanHaaren

DriesDeprest commented 10 months ago

I think your definition of assist being a pass being followed directly by a shot event is too narrow.

If you look at the raw data of vendors (e.g. StatsBomb) you see that in between the pass, which gets annotated as an assist, and the shot, there often are carry or duel events, maybe even others.

Therefore, I would use the qualifiers in the raw data where possible. For vendors which don't support shot or goal assists, you can always still try to calculate it.

JanVanHaaren commented 10 months ago

I agree that we should make the use of the PassType.ASSIST qualifier consistent across the deserializers for the different data providers. This example nicely illustrates that we need more formal definitions for our events and qualifiers to avoid ambiguity as data providers sometimes use the same terms for different concepts.

I'm also in favor of deriving qualifiers, and in some cases even events, from the raw data as much as possible. I believe that this approach would help us to arrive at more uniform and predictable behavior across our deserializers, especially for rather well-defined concepts such as shot assists or key passes and goal assists. However, in other cases, the qualifiers that data providers add to events might not be reconstitutable from the context.

I'm mostly familiar with the StatsBomb event data. They add a goal_assist field to passes leading to a goal and a shot_assist field to passes leading to a shot that was not a goal. However, a case could be made for assigning both qualifiers to passes that are goal assists. In my opinion, it all comes down to properly defining the qualifiers in the first place.

DriesDeprest commented 9 months ago

I'm also in favor of deriving qualifiers, and in some cases even events, from the raw data as much as possible. I believe that this approach would help us to arrive at more uniform and predictable behavior across our deserializers, especially for rather well-defined concepts such as shot assists or key passes and goal assists. However, in other cases, the qualifiers that data providers add to events might not be reconstitutable from the context.

I don't fully see how you could properly derive shot or goal assists from the raw data. As I said, I think Koen's approach

if event.event_type == EventType.PASS:
  next_event = event.next()
  if next_event.event_type == EventType.SHOT:
    event.qualifiers.append(
        PassQualifier(
            PassType.ASSIST_GOAL if next_event.result == ShotResult.GOAL else PassType.ASSIST_SHOT
        )
    )

would result in a too narrow definition of shot and goal assists as between the pass, which gets annotated as an assist, and the shot, there are often carry or duel events, maybe even others.

How would you suggest that we properly and accurately derive shot or goal assists from the raw data?

JanVanHaaren commented 9 months ago

Automatically deriving assist qualifiers would indeed require slightly more sophisticated business logic, but all required information should be available in the data feeds to implement the most common assist definitions.

Moreover, automatically deriving assist qualifiers would even allow us to support multiple definitions as different leagues use slightly different definitions. For example, the Belgian Pro League uses the following definition to award goal assists to players.

image

DriesDeprest commented 8 months ago

I'm still struggling with the following two subjective attributes in the Pro League's definition:

Other aspects I think we should consider when building our own custom definition of an assist:

I'd also propose, that I first fix, that we are consistent in our assist definitions for different providers in our current implementation where we use the data providers qualifiers to derive this. This way, we would at least not be using shot and goal assists interchangeably.

I will thus make a PR that supports both SHOT_ASSIST & GOAL_ASSIST.

probberechts commented 8 months ago

I've looked up a few definitions from different leagues and it seems that they all have subjective components regarding that the assist must (1) be intentional and (2) have a direct influence on the outcome of a goal scored. Moreover, each league uses a slightly different definition and they are continuously changed.

I believe that most people would want that the assists in kloppy are identical to the official stats. And since I do not believe that you can derive "official assists" automatically from the data, I would rely on the data provider's judgment. Also, I wonder whether data providers (always) use their own definition or adopt the league's official stats in practice. For example, I know that Opta will sometimes correct an own-goal to a goal (or vice-versa) post-game.

I don't know what your use case is, but if you need something that is consistent across data providers, you could add another assist category named KLOPPY_ASSIST (I don't have a good name for it immediately) that is automatically derived. What would also be relevant to add is Opta's "Fantasy Goal Assist" which is a very broad definition of assists that is used in fantasy football and Football Manager and that can be automatically derived.

DriesDeprest commented 8 months ago

I'm also in favour of taking over the data provider's judgement to label passes as shot or goal assists (this should be resolved with: https://github.com/PySport/kloppy/pull/281). Having a KLOPPY_ASSIST, which is automatically derived from the raw data, is then indeed a good solution for also having a consistent metric to compare between data collected by different providers or from different leagues.