PySport / kloppy

kloppy: standardizing soccer tracking- and event data
https://kloppy.pysport.org
BSD 3-Clause "New" or "Revised" License
362 stars 59 forks source link

Add support for JSON Tracab data #295

Closed DriesDeprest closed 7 months ago

DriesDeprest commented 8 months ago

Goal Add support to serialize tracking data from Tracab, where both the meta and the raw data are in JSON format.

Description Refactored the code, so we can both serialize tracking data from Tracab, if a) the meta data is XML and raw data .dat or b) the meta data is JSON and raw data JSON.

I've refactored the code to support two versions of input data similar to how we handle it for the Wyscout v2 and v3 event data models.

Questions I'm uncertain about the implementation, which determines the orientation for my Tracab JSON deserializer. I'm confused again on HOME_TEAM / AWAY_TEAM / FIXED_HOME_AWAY.

In the meta data it is defined whether home or away team plays from left to right in the different periods. In the example file we have in our tests, we can read in the meta data that: "Phase1HomeGKLeft": false & "Phase2HomeGKLeft": true. This means that the home team GK has approx "X": 5000 at the start of the first half and approx "X": -5000 at the start of the second half, in the raw data.

What should the orientation then be? Should it be AWAY_TEAM for this example?

JanVanHaaren commented 8 months ago

Based on @probberechts' table in #190, I presume the orientation in this particular example should be FIXED_AWAY_HOME or AWAY_HOME once #190 is merged.

probberechts commented 8 months ago

Based on @probberechts' table in #190, I presume the orientation in this particular example should be FIXED_AWAY_HOME or AWAY_HOME once #190 is merged.

That's right.

DriesDeprest commented 8 months ago

Any updates @JanVanHaaren @koenvo ?

koenvo commented 7 months ago

Github tells me there is a merge conflict. Can you please make sure to merge the master in?

I really like the auto identification of the format to make it easier for users! Great work.

@JanVanHaaren I prefer serialization_format over data_version. What do you think? The version makes me think there can be different versions with different data in it, but it's only the format that changes, right?

DriesDeprest commented 7 months ago

Thanks for the feedback @koenvo, should be good now!

probberechts commented 6 months ago

I've noticed a small bug in this PR. Kloppy allows specifying the data source as a FileLike object but now it only works with a Path. For example with a string:

from kloppy import tracab

dataset = tracab.load(
    meta_data="kloppy/tests/files/tracab_meta.xml",
    raw_data="kloppy/tests/files/tracab_raw.dat",
)

It gives the following error: AttributeError: 'str' object has no attribute 'name'

koenvo commented 6 months ago

I've noticed a small bug in this PR. Kloppy allows specifying the data source as a FileLike object but now it only works with a Path. For example with a string:

from kloppy import tracab

dataset = tracab.load(
    meta_data="kloppy/tests/files/tracab_meta.xml",
    raw_data="kloppy/tests/files/tracab_raw.dat",
)

It gives the following error: AttributeError: 'str' object has no attribute 'name'

Hmm let me look into this. Ah, yes I like the approach @JanVanHaaren used in https://github.com/PySport/kloppy/blob/master/kloppy/infra/serializers/tracking/statsperform.py#L262C22-L263C52 but forgot to mention it here.