PySport / kloppy

kloppy: standardizing soccer tracking- and event data
https://kloppy.pysport.org
BSD 3-Clause "New" or "Revised" License
371 stars 61 forks source link

Standardize position types #331

Open DriesDeprest opened 4 months ago

DriesDeprest commented 4 months ago

In a step to further standardize the output of kloppy's Dataset, I propose to create a PositionType Enum class, containing the different possible position types on a football pitch, commonly used in the industry. In each of the serializers, we would then map the possible positions ids / labels used by the provider to the equivalent kloppy PositionType.

A proposition of the values in the class is the following:

class PositionType(Enum):
    GK = "GK"
    LB = "LB"
    LCB = "LCB"
    CB = "CB"
    RCB = "RCB"
    RB = "RB"
    LW = "LW"
    LM = "LM"
    LDM = "LDM"
    LCM = "LCM"
    LAM = "LAM"
    CDM = "CDM"
    CM = "CM"
    CAM = "CAM"
    RW = "RW"
    RM = "RM"
    RDM = "RDM"
    RCM = "RCM"
    RAM = "RAM"
    LF = "LF"
    ST = "ST"
    RF = "RF"

The Position class would then be updated as followed:

@dataclass(frozen=True)
class Position:
    position_type: PositionType
    coordinates: Optional[Point] = None

    def __str__(self):
        return self.name

    @classmethod
    def unknown(cls) -> "Position":
        return cls(position_id="", name="Unknown")

Not sure what to do with the attributes position_id or position_name. We could keep them as provider_position_id and/or provider_position_name, but not sure whether that is required.

probberechts commented 4 months ago

I would use a hierarchical position mapping (e.g., "Midfielder" > "AM" > "CAM") for two reasons.

  1. Not all data providers provide positions that are as fine-grained as the ones defined above. If a provider uses "AM" instead of "CAM" it would not be possible to map it to one of the standardized types.
  2. It makes it more straightforward to select position groups (e.g., all midfielders) without having to specify a list of positions.
DriesDeprest commented 4 months ago

Thanks for the feedback. Do you think the hierarchical position groups should be MECE? And how would the pseudocode look like for the PositionType with the hierarchical mapping?

UnravelSports commented 4 months ago

@probberechts very interesting comment indeed!

When you say "Midfielder" > "AM" > "CAM" would it make sense (although it might be overkill) to break up a positional into different "axes". Instead of going Midfielder, AM, CAM, why not describe the position as Midfielder, Central and Attacking.

Just some idea, which already breaks for LCM and LB, but something along these lines:

Not quite sure if it's desirable though, but worth considering maybe

probberechts commented 4 months ago

@DriesDeprest I think they should be MECE yes. I can't really think about anything that would overlap actually. I haven't thought about the implementation yet.

@UnravelSports Ultimately it is not a big difference but for me, there is intuitively a natural ordering of these axes as "line" > "style" > "area" which leads to e.g., "Midfielder" > "DM" > "LDM". And "wide" does not seem to add anything. A wide "LDM" would just be a "LM", right? I am not sure whether other people also have the same mental map of positions or whether it is just me :smiley:

DriesDeprest commented 4 months ago

@probberechts, do you think something like this could work?

from enum import Enum
from dataclasses import dataclass
from typing import Optional, Union

class PositionType:
    class Goalkeeper(Enum):
        GK = "Goalkeeper"

    class Defender:
        class General(Enum):
            DEF = "Defender"

        class FullBack(Enum):
            LB = "Left Back"
            RB = "Right Back"

        class CenterBack(Enum):
            LCB = "Left Center Back"
            CB = "Center Back"
            RCB = "Right Center Back"

    class Midfielder:
        class General(Enum):
            MID = "Midfielder"

        class Defensive(Enum):
            DM = "Defensive Midfield"
            LDM = "Left Defensive Midfield"
            CDM = "Center Defensive Midfield"
            RDM = "Right Defensive Midfield"

        class Central(Enum):
            CM = "Central Midfield"
            LCM = "Left Center Midfield"
            CM = "Center Midfield"
            RCM = "Right Center Midfield"

        class Attacking(Enum):
            AM = "Attacking Midfield"
            LAM = "Left Attacking Midfield"
            CAM = "Center Attacking Midfield"
            RAM = "Right Attacking Midfield"

        class Wide(Enum):
            WM = "Wide Midfield"
            LW = "Left Wing"
            RW = "Right Wing"
            LM = "Left Midfield"
            RM = "Right Midfield"

    class Attacker:
        ATT = "Attacker"
        LF = "Left Forward"
        ST = "Striker"
        RF = "Right Forward"

@dataclass(frozen=True)
class Position:
    position_type: Union[Enum, PositionType]
    coordinates: Optional['Point'] = None

    def __str__(self):
        return self.position_type.value

    @classmethod
    def unknown(cls) -> "Position":
        return cls(position_type=PositionType.Goalkeeper.GK)

# Example usage of the updated Position class
specific_position = Position(position_type=PositionType.Midfielder.Defensive.LDM)
print(specific_position)  # Output: Left Defensive Midfield

general_position = Position(position_type=PositionType.Midfielder.Defensive.DM)
print(general_position)  # Output: Defensive Midfield
probberechts commented 4 months ago

This is how I would implement it:

from enum import Enum

class PositionType(Enum):
    Goalkeeper = ("Goalkeper", "GK", None)

    Defender = ("Defender", "DEF", None)
    FullBack = ("Full Back", "FB", "Defender")
    LeftBack = ("Left Back", "LB", "FullBack")
    RightBack = ("Right Back", "RB", "FullBack")
    CenterBack = ("Center Back", "CB", "Defender")
    LeftCenterBack = ("Left Center Back", "LCB", "CenterBack")
    RightCenterBack = ("Right Center Back", "RCB", "CenterBack")

    Midfielder = ("Midfielder", "MID", None)
    # Add subtypes of Midfielder here...

    Attacker = ("Attacker", "ATT", None)
    # Add subtypes of Attacker here...

    def __init__(self, long_name, code, parent):
        self.long_name = long_name
        self.code = code
        self._parent = parent

    @property
    def parent(self):
        if self._parent:
            return PositionType[self._parent]
        return None

    def is_subtype_of(self, other):
        current = self
        while current is not None:
            if current == other:
                return True
            current = current.parent
        return False

    def __str__(self):
        return self.long_name

# Example usage
print(PositionType.LeftBack.parent)  # Output: PositionType.FullBack

print(PositionType.Defender.is_subtype_of(PositionType.Defender))  # Output: True
print(PositionType.LeftCenterBack.is_subtype_of(PositionType.Defender))  # Output: True
print(PositionType.LeftBack.is_subtype_of(PositionType.Midfielder))  # Output: False

print(PositionType.LeftCenterBack)  # Output: Left Center Back
print(PositionType.LeftCenterBack.code)  # Output: LCB
UnravelSports commented 4 months ago

@UnravelSports Ultimately it is not a big difference but for me, there is intuitively a natural ordering of these axes as "line" > "style" > "area" which leads to e.g., "Midfielder" > "DM" > "LDM". And "wide" does not seem to add anything. A wide "LDM" would just be a "LM", right? I am not sure whether other people also have the same mental map of positions or whether it is just me 😃

Yes I was trying to get at the same thing, added the "wide" last minute because I was trying to fit in all the acronyms, but that didn't work ha