PySport / kloppy

kloppy: standardizing soccer tracking- and event data
https://kloppy.pysport.org
BSD 3-Clause "New" or "Revised" License
362 stars 59 forks source link

Opta Incomplete / Live F7 missing Result tag fix #322

Closed UnravelSports closed 4 months ago

UnravelSports commented 4 months ago

Let's try this again....

A really simple fix here. When loading opta event data (as shown below) for an incomplete f7 (read: during live matches) we get an AttributeError.

dataset = opta.load(
    f7_data="f7.xml",
    f24_data="f24.xml"
)

Error looks as follows:

Traceback (most recent call last):
  File "/workspaces/kloppy/_dev.py", line 3, in <module>
    dataset = opta.load(
  File "/workspaces/kloppy/kloppy/_providers/opta.py", line 35, in load
    return deserializer.deserialize(
  File "/workspaces/kloppy/kloppy/infra/serializers/event/opta/deserializer.py", line 729, in deserialize
    match_result_type = list(match_result_path.find(f7_root))[
  File "src/lxml/objectpath.pxi", line 55, in lxml.objectify.ObjectPath.__call__
  File "src/lxml/objectpath.pxi", line 219, in lxml.objectify._find_object_path
AttributeError: no such child: Result

Normally, in a completed game we'd have a Result tag that looks something like this:

<Result Type="Aggregate" Winner="t12345" />

But, because the game is not finished yet there simply is no "Result" tag.

Doing a simple try / except block like below resolves this issue and creates a correct Kloppy EventDataset.

try:
       match_result_type = list(match_result_path.find(f7_root))[
             0
      ].attrib["Type"]
except AttributeError:
      match_result_type = None

Not 100% sure if using a try-except block here is the best solution, but it works.

UnravelSports commented 4 months ago

Yeah I thought as much (in regards to the try except). I've changed the implementation slightly because list(match_result_path.find(f7_root)) would throw the AttributeError.

I've also looked through the code and I don't think there will be any issues with having initialized 2 Periods without the second period actually having started yet. The below part (found In Line 770 - 783 makes me believe that if a Period hasn't started yet setting a start time will be skipped and period.start_timestamp will simply be left as None (how it is initialized).

From simply running this script on a game that was at minute ~15 (so, incomplete / live) file, I also didn't encounter any problems with initializing a second "empty" period.

If you want I can run some more tests, but I'm not quite sure what else to check :)

if type_id == EVENT_TYPE_START_PERIOD:
      logger.debug(
          f"Set start of period {period.id} to {timestamp}"
      )
      period.start_timestamp = timestamp
  elif type_id == EVENT_TYPE_END_PERIOD:
      logger.debug(
          f"Set end of period {period.id} to {timestamp}"
      )
      period.end_timestamp = timestamp
  else:
      if not period.start_timestamp:
          # not started yet
          continue