PySport / kloppy

kloppy: standardizing soccer tracking- and event data
https://kloppy.pysport.org
BSD 3-Clause "New" or "Revised" License
362 stars 59 forks source link

[Tracab] Parse xml meta information #300

Closed DriesDeprest closed 5 months ago

DriesDeprest commented 6 months ago

Team and player meta information (name, id, ...) is now read out of the .xml file, when using .xml meta file + .dat tracking data file. Adjusted the meta .xml and tracking .dat file to contain the same information as the meta .json and tracking .json files and adapted the tests accordingly.

DriesDeprest commented 6 months ago

The only Tracab documentation I am aware of, is the following: https://api.tracab.com/docs.

So for your use case you have access to Tracab tracking data in DAT format, but you can't get Tracab match meta information, in either XML or JSON format, which contains details about the teams and players? How is the raw Tracab tracking data received, not via the API?

JanVanHaaren commented 6 months ago

So for your use case you have access to Tracab tracking data in DAT format, but you can't get Tracab match meta information, in either XML or JSON format, which contains details about the teams and players? How is the raw Tracab tracking data received, not via the API?

I got access to TRACAB tracking data and Opta event data for a single match as a trial. The actual tracking data is in the DAT format and the "metadata" is in the XML format. While the structure of the XML format resembles the structure that is described on the API documentation website, it is different in a few ways. I suspect it is a custom format and not the standard TRACAB format.

The underlying question actually was whether we can assume that all information is always available in the metadata file or whether some information such as the team and player information can also be missing in some cases. For the sake of this pull request, I would be fine with assuming that all metadata is always available though.

koenvo commented 6 months ago

Looks like there are some merge conflict. Could you please resolve those and than I'm happy to merge!

dvilches commented 5 months ago

Perhaps our experience contributes something, during the Qatar World Cup we worked with the Tracab data that they provided us through the Fifa Football Data Platform and they gave us the tracking in DAT format and the metadata in .json that always came complete (teams , players, referees, etc.). From the .json we built the .xml to be able to use Kloppy.

DriesDeprest commented 5 months ago

Thanks for sharing @dvilches! My proposal would be to keep the scope of this PR as is, which is adding support for XML parsing with an obligatory presence of team and player information in the metadata.

Future contributions to the Tracab serializer to make its usage more general could be;

  1. Allow user to use any combination of metafile file type (JSON / XML) with any tracking data file type (DAT / JSON)
  2. Add support for missing player / team meta information. If no missing player / team meta information is present, dummy players and teams with dummy name and IDs should be created.

If 1) were developed, for your use case where you have JSON metadata and DAT tracking data, you would not need to transform your JSON meta file into an XML file prior to passing it to kloppy.