ML-KULeuven / socceraction

Convert soccer event stream data to SPADL and value player actions using VAEP or xT
MIT License
625 stars 139 forks source link

OptaLoader Whoscored parser #42

Closed luistelmocosta closed 3 years ago

luistelmocosta commented 3 years ago

Hello, can you please explain a little better how the OptaLoader with Whoscored works? What is the feeds dict() format?

dict_opta = {
                'whoscored': "PremierLeague-2020_2021\\1485314.json"
            }
datafolder = "..\data\Premier_League-2020_2021"
SBL = opta.OptaLoader(root=datafolder, feeds=dict_opta, parser='whoscored')

I tried this quick test but I don't think I am doing it correctly since I am not getting any competitions from competitions = SBL.competitions()

Can you give me a quick example on how to load the whoscored json?

Kind regards

probberechts commented 3 years ago
  1. You have to store the WhoScored JSON files in a file structure that includes the competition ID, season ID and game ID. For example: ./data/2-2021/game_1234.json, ./data/2-2021/game_1235.json, ... where 2 is the competition ID, 2021 is the season ID, and 1234 and 1235 are the game IDs.

  2. Create and OptaLoader instance

    
    from socceraction.spadl.opta import OptaLoader

loader = OptaLoader( root="./data", parser='whoscored', feeds={'whoscored': "{competition_id}-{seasonid}/game{game_id}.json"})

The feeds dict is a pattern that allows the loader to select the corresponding json files for a competition, season or game. It should correspond to the file structure above, where the IDs are replaced by the "{competition_id}", "{season-id}" and "{game_id}" keys.

3. Load data with 
```python
games = games(competition_id=2, season_id=2021)
players = loader.players(game_id=1234)
teams = loader.teams(game_id=1234)
events = loader.events(game_id=1234)

Some notes: