ML-KULeuven / socceraction

Convert soccer event stream data to SPADL and value player actions using VAEP or xT
MIT License
621 stars 137 forks source link

Lack of compatibility with Wyscout Data: Event Data for one specific match #712

Closed derwidii closed 6 months ago

derwidii commented 6 months ago

I am currently using the socceraction library to convert Wyscout event data into SPADL actions. I am specifically trying to use the convert_to_actions function from the socceraction.spadl.wyscout module. However, I've encountered some challenges due to a mismatch between the expected data format of the library and the format of my data.

My dataset consists of several JSON files (e.g., match1.json, match2.json, etc.), each containing only the events of a specific match. Unlike the provided Wyscout dataset structure, there are no separate JSON files for teams, players, or matches metadata. Each events file directly lists the events without additional match metadata. The data is in the following format:

https://footballdata.wyscout.com/wp-content/uploads/2021/01/V3Events.txt

Now the tricky part is that i cannot convert this datastructure into the SPADL format.

Issues Encountered:

  1. Missing 'matches' Object: The convert_to_spadl function fails with a KeyError stating "no object named matches in the file" because it expects a broader dataset that includes match metadata.

  2. Missing Columns in convert_actions: The convert_actions function does not work due to missing columns such as 'positions' and 'tags', which my dataset does not contain.

Is there a way to adapt the convert_to_spadl or convert_actions functions to handle a dataset structured as mine, without the typical 'matches', 'teams', and 'players' data? I w

I'd appreciate any help you can offer. I'd love to use your package nevertheless. I'd be especially interesting in estimating the VAEP. Therefore, thank you very much in advance for your support.

probberechts commented 6 months ago

This is Wyscout v3 data. It is currently not supported (see #156) and I do not have plans to implement it myself.