PySport / kloppy

kloppy: standardizing soccer tracking- and event data
https://kloppy.pysport.org
BSD 3-Clause "New" or "Revised" License
340 stars 56 forks source link

Transform orientation "BALL_OWNING_TEAM" should work on event data #33

Open koenvo opened 4 years ago

koenvo commented 4 years ago

Event data does not have ball_owning_team set and therefore transform throw an exception when trying to transform to BALL_OWNING_TEAM orientation. Suspicion is this should work and ball_owning_team should be set to same value as action_executing_team

rjtavares commented 4 years ago

Should depend on the action, no? Is a clearance enough to establish "ball owning"?

koenvo commented 4 years ago

Good point. Let me check the different types events and which team is assigned to them.

rjtavares commented 4 years ago

How about letting users define possession (I.e. specify which events change possession)? Could work well with the windows framework, and have a window of possession ID to treat sequences of events.

koenvo commented 4 years ago

Interesting. I would add a default list of possession changing events and let users override it. And needs to some changes to be able to use possession ID for the windowing. This is a different ticket imo.

rjtavares commented 4 years ago

Yeah, I went over the windowing code and agree, that's not the solution.

So, each serializer would need to handle possession. SB has possession_team already, so that's taken care of. I could try to do it for Opta's serialzier, but I don't have a file to test it with. Is there any sample file I could use?

My idea is something like:

if type_id in (list_of_event_types):
    possession_team = team

Then change ball_owning_team=None to ball_owning_team=possession_team

rjtavares commented 4 years ago

@koenvo can you check this change to see if it makes sense? https://github.com/rjtavares/kloppy/commit/5afec063bdfb7138e1e00dd0340ebef6ccf4898e

I chose to update the possession_team variable after checking for types to use kloppy's event_type instead of Opta's. That should make maintenance easier. I did have to keep setting ball_owning_team in the generic_event_kwargs otherwise there would an an error when creating the event.

If it makes sense I'll make the pull request.

koenvo commented 4 years ago

I like it. @bdagnino can you also have a look? So idea is to set the ball_owning_team on some known events (pass,shot,Carry,takeon) and use that value for the other events after that

rjtavares commented 4 years ago

BTW, the list of event_types relevant for possession should probably be moved to event.py in domain/models and imported so that it can be reused in other serializers.

bdagnino commented 4 years ago

@rjtavares @koenvo I like the idea yes. I did something like this for #44 with _parse_ball_owning_team but indeed would be better if there is a general implementation for all serializers.

rjtavares commented 4 years ago

@bdagnino your implementation doesn't make possession persist (you return None if the event type is not one of possession event types). I believe handling possession sequences and including non possession events in those sequences is important, but I do wonder if events are really sequential (if they aren't, assuming last event's possession team will introduce errors).

Regarding the list of possession event types, @eujern suggested a ball_owning_event property in EventType. Another option would be a PossessionEventType enum in event.py.

bdagnino commented 4 years ago

@koenvo not sure about what you mean by not making possession persist?

rjtavares commented 4 years ago

Say there's three events:

  1. Pass by Home Team
  2. Challenge (Tackle) by Away Team
  3. Pass by Home Team

Event 1 and 3 should have Home as BALL_OWNING_TEAM, that's obvious. But what about Event 2?

In your code BALL_OWNING_TEAM would call _parse_ball_owning_team, which would return None. In my code for Opta's serializer BALL_OWNING_TEAM would be set to Home (because that's the BALL_OWNING_TEAM of the previous event).

bdagnino commented 4 years ago

@rjtavares I see. That was on purpose. Since in Metrica event data challenges and cards don't change possessions and could be for either team I thought best to not indicate team possession. Then if you want to look at team possession you can look at all the events that DO have a possession and work with that.

In your particular example I think it would be the same thought, to assume possession for the intermediate event. So I'd be ok with that as well. But for example if you have a CARD event, no one is in possession, so in that case better return None IMO.

When possible we should try to add it to all events to a provider, not only to the ones that are recognized by Kloppy as non generic. For example, we have an event in Metica called FOUL RECEIVED that ends a possession, but wouldn't be picked up if we only limit it to the Kloppy event types.

@koenvo to your point about overlapping. I think we could do our best job to set a default for team ownership, and then if there is overlap between possessions for example leave that on the users hands how to deal with that.

rjtavares commented 4 years ago

I was influenced by Statsbomb's data, which has an explicit possession_team for events that seems to behave like what I used.

In your particular example I think it would be the same thought, to assume possession for the intermediate event. So I'd be ok with that as well. But for example if you have a CARD event, no one is in possession, so in that case better return None IMO.

When possible we should try to add it to all events to a provider, not only to the ones that are recognized by Kloppy as non generic. For example, we have an event in Metica called FOUL RECEIVED that ends a possession, but wouldn't be picked up if we only limit it to the Kloppy event types.

You're right. Each serializer should take advantage of whatever info each provider has (while trying to be overall as much consistent as possible).

koenvo commented 4 years ago

I think it would be good to:

  1. Make it work independent of the serializer. IMO its an enrichment step like adding the current score, and current lineup
  2. When possession changes is part of the definition of soccer. I don’t know if people (coaches etc) would normally say the possession changes in case of a card. Could be an argument to the enricher. When we let leak to much information from a provider into kloppy (like how they decide to define possession) we lose standardisation. We can keep the original data from the provider but I would really argue for having a way to determine possession without info about possession from providers.
  3. Filtering events should be moved outside of the serializer and I think part of the helper

I would purpose to change the api to make it chainable like: load(“file”).enrich().filter(“pass”, ...).to_pandas() This also allows people to add arguments to for example enrich step to specify other possession changing events.

What do you think?

bdagnino commented 4 years ago

This one is really tricky, and probably one of the most difficult thing that Kloppy should try to solve as good as possible, meaning standardization of events between providers. In my personal case the biggest roadblock is that I'm not super familiar with other events definitions other than Metrica's, so it's hard to judge. So can't really provide a concrete answer, but before commenting further, what is the final goal of this ticket? I'm not 100% sure of what is that is the expected result of computing Transform orientation "BALL_OWNING_TEAM", does it work at the whole dataset level or on a specific event? And what would be the desired result?

koenvo commented 4 years ago

Hmm yes we might have lost sight on the origin of this ticket. I stumbled upon this issue by accident (copy & pasted transform code and forgot to remove the ball owning part).

With new insights I would keep this ticket for reference and use values that provider specifies, like statsbomb. When a provider does not specify this we will not fill "ball_owning_team" on events. Then open a new ticket for adding more advanced state to events. An example can be found here: https://github.com/PySport/kloppy/pull/37/files#diff-a1fbb7e35b0c8e235dc9a82143170facR15-R28 . Adding this state makes it possible to filter/window events on that state.

rjtavares commented 4 years ago

Can I have a go at this one? Something like a PossessionGainedEvent that changes state.possession_team

koenvo commented 4 years ago

I rather have a POSSESSION_GAINING_EVENTS = (.... , ... ) and check if event.event_type in POSSESSION_GAINING_EVENTS. And please yes :-)

rjtavares commented 4 years ago

Do you have a suggestion for what the State class property name should be?

koenvo commented 4 years ago

Let's discuss this in more detail here: https://github.com/PySport/kloppy/issues/48