PySport / kloppy

kloppy: standardizing soccer tracking- and event data
https://kloppy.pysport.org
BSD 3-Clause "New" or "Revised" License
328 stars 55 forks source link

Add StatsBomb shot result coordinates #232

Closed probberechts closed 7 months ago

probberechts commented 8 months ago

The "result_coordinates" field of ShotEvent now contains the shot's end coordinates.

probberechts commented 8 months ago

I've added a test and it also deserializes the z-coordinate now.

One thing I'm not sure about is how to deal with the cell-based coordinates here. It probably makes sense to put the y-coordinates of shots at 120 instead of 119.95 and I don't know whether the (same) cell-based coordinates are also used for the z-coordinate.

JanVanHaaren commented 8 months ago

According to StatsBomb's shot fidelity specification, the y-coordinates and z-coordinates of shot end locations are expressed in tenths of a yard, regardless of the shot fidelity version. Moreover, as of shot fidelity version 2, all coordinates associated with shots and events paired with shots are expressed in tenths of a yard.

probberechts commented 8 months ago

🤔 This shot fidelity thing is getting complicated.

Is this correct:

Note that this does not correspond to what is stated in the docs of the open data:

Shot Fidelity Version 2

  • Shots, freeze frames and events paired to shots use high fidelity x,y coordinates

Shot Fidelity Version 1

  • All events and freeze frames use standard location granularity.

And clarifying my doubt regarding the y-coordinate of the end location of a shot:

JanVanHaaren commented 8 months ago

Could you please remind me what the rationale behind the cell-based approach is? If my understanding is correct, StatsBomb rounds coordinates either to the nearest yard or to the nearest tenth of a yard. Intuitively, it would make sense to me to keep the original coordinates for events such as goals, penalty kicks and corner kicks.

According to the Shot Fidelity V2 specification, which seems to be only available to StatsBomb customers, the situation is as follows.

probberechts commented 7 months ago

Could you please remind me what the rationale behind the cell-based approach is?

https://twitter.com/lemonwatcher/status/1267784042776154112

According to the docs StatsBomb coordinates are rounded to the nearest (tenth of) a yard, according to Thom they are rounded up.

koenvo commented 7 months ago

My idea was that when (0,1] is rounded up to 1, it would mean the cell ((0,1], (0,1]) is indicated by the point (1,1). My (maybe incorrect) assumption was that it would make sense to estimate the actual position to the average position in the cell which is (0.5, 0.5).

I guess the “estimate position in cell” needs to take the type of event into account and switch to something else than average in certain cases.

probberechts commented 7 months ago

According to the Shot Fidelity V2 specification, which seems to be only available to StatsBomb customers, the situation is as follows.

I've looked at some data and this seems to be accurate indeed. I've implemented it like this now.

My idea was that when (0,1] is rounded up to 1, it would mean the cell ((0,1], (0,1]) is indicated by the point (1,1). My (maybe incorrect) assumption was that it would make sense to estimate the actual position to the average position in the cell which is (0.5, 0.5).

We've implemented it like that in socceraction too. I think it is the best approach.

I guess the “estimate position in cell” needs to take the type of event into account and switch to something else than average in certain cases.

I agree. It makes sense to do this for kick-offs, corners, penalties, goal-kicks, and shot end coordinates. However, I do not plan to do it in this PR since it would require some more extensive changes to the code.