Alek050 / databallpy

Package to read, preprocess, vizualise, and synchronise soccer event and tracking data.
MIT License
7 stars 6 forks source link

Add expected pass model #242

Open Alek050 opened 1 month ago

Alek050 commented 1 month ago

A physical model that predicts the likelyhood of a successfull pass given the locations and velocities of all players, the initial ball velocity, and the ball moving angle.

jonas-bischofberger commented 1 week ago

I have started working on the implementation of the model and am currently encountering two major pain points:

  1. Normalized coordinates (wrt attacking direction) are not by default included in the processed data even though they must be used somewhere such as in xG and xT - is there a standardized way to obtain them that I'm missing? At the moment, I would do databallpy.features.add_team_possession to get the possession info and use that to calculate the normalized coordinates myself.
  2. The tabular tracking data format I obtain via match.tracking_data does not make sense to me - currently a row corresponds to an entire frame of tracking data rather than a object-position pair. But this means that I don't have and can't add any meta data about the players (e.g. to identify which team a player belongs to) and also can't join player identities with the event data (e.g. to exclude the passer from potential receivers). Is there a built-in way to get a different table format and to get the missing mapping information between tracking and event data?
Alek050 commented 1 week ago

Hi @jonas-bischofberger, thanks for your message and great to see that you started!

  1. right now there is not a build in way to normalize coordiantes wrt attacking direction. The playing direction for home team is always from left to right, and for the away team from right to left. I will open an issue to create a build in way to get normalized coordinates wrt attacking direction.

For now, there are two scenarios: if you need only the tracking and event data at the moment of the pass, use the team_id column in the event data to find out whether it is the match.home_team_id or the match.away_team_id. If it is the away team id, you have multiply all _x, _vx (and _ax) columns by -1 in the tracking data, and the start_x, start_y (and end_x, end_y) in the event data. If you need it normalized for all frames, not only the ones where events happen, the approach you use right now is the only solution.

  1. This was a design choice at the beginning of the package. All the metadata about the players can be found in match.home_players and match.away_players. You can use match.player_id_to_column_id() to match player ids to the column id in the tracking data (which is f"{team_side}_{jersey_number}). Also check out the match.home_players_column_ids() or match.away_players_column_ids() to get a list of column ids for an entire team.

Lastly, check out the match.passes_df or the match.pass_events for more info. For instance, match.pass_events is a dict with PassEvents with attributes like team_side, start_x. The PassEvents should work generally, but is still in beta use so some bugs might be in there. On top of that, I have limited access to metrica data so there might be some weir edge cases.

If you have any ideas/updates on how to make the package more intuitive and easier to use, please let me know so I can make some changes to the package and make it easier for anyone to use.