PySport / kloppy

kloppy: standardizing soccer tracking- and event data
https://kloppy.pysport.org
BSD 3-Clause "New" or "Revised" License
326 stars 55 forks source link

Wyscout v3 events sometimes have 10 player formations for (opponent) team #315

Open DriesDeprest opened 1 month ago

DriesDeprest commented 1 month ago

I noticed that the team / opponent team object in the events of Wyscout v3 event data, doesn't always have a formation where the total amount of players summed up equals to 11. For example, I've seen occurrences of events with a "4-4-1" (opponent) team formation. Currently, our serializer crashes when this is the case, as it does not recognize this formation.

When analysing the event data of a match where we have the 'troublesome' formations, I saw that this was the result of a team getting a red card booking and in the events that followed the formation of that team was described by a 10 player formation. The team originally had a "4-4-1-1" formation, but after the red card this shifted to "4-4-1".

How do we want to handle this?

Option A:

Option B:

I think my preference would go to option B, to have standard behaviour across different providers.

dvilches commented 1 month ago

In my personal opinion, it is always better for the data to reflect reality as accurately as possible.

In our case, it is important to know our own and our opponent's formation, as we analyze "behavior" with different schemes and clearly when there is one or more players less on the field, that changes.

Likewise, it would have to be seen what most users use Kloppy for, since these "own" issues can be solved, as until now, by performing our own processing on eventing data, in this case.

DriesDeprest commented 1 month ago

@koenvo @JanVanHaaren @probberechts thoughts? I'd like to start implementing this

DriesDeprest commented 1 month ago

@dvilches thanks for sharing your take on option A vs B. I understand your need of having an accurate description of a team's behaviour to perform qualitative performance analysis.

Since I'm using kloppy for reading in data from different providers, the aspect that we have a standardized output for different input vendors is more importantly for my use case than the level of detail that we get extra. Therefore, my preference for option B.

In the future, however, I think we should elaborate the possible Enum values of FormationType to also include formations for when there are 10/9/8 players on the pitch and use these for all providers if there are < 11 players on the pitch of a given team.

For Wyscout, we can get the X player formation directly from the team or opponentTeam properties. For other providers, where the formation data is not included in each event, we would need to do it in an alternative way. We would need to recognize when a team starts playing with < 11 players (due to a red card or sub off without a sub on) and based on the position (defender / midfielder / attacker) of the player that gets sent off, adapt the formation accordingly. For example, if team A was playing in a 4-5-1 and their CM gets sent off, we would assume they now play in a 4-4-1 until they change formation again.

dvilches commented 1 month ago

Hi @DriesDeprest, I agree with your perspective. That's why we're clarifying that we can resolve this issue "outside" of Kloppy, and that a quick solution for most users is more important than the "best solution" for us. Thank you for your continued contributions to the project.

JanVanHaaren commented 1 month ago

I don't have a strong opinion but I'm leaning towards option B.

In an ideal world, kloppy would be able to represent the actual formations for both teams at each point in a match, but the information that the data providers are offering might be too limited in some cases.