UnravelSports / unravelsports

The unravelsports package aims to aid researchers, analysts and enthusiasts by providing intermediary steps in the complex process of turning raw sports data into meaningful information and actionable insights.
Mozilla Public License 2.0
42 stars 1 forks source link

Dynamic Feature Selection and export settings #2

Open MihirT906 opened 2 months ago

MihirT906 commented 2 months ago

Overview

The following changes have been made:

Summary of Changes

Dynamic selection of node features

The goal of this enhancement is to provide users with the flexibility to configure which features should be included in the nodes of the GraphConverter

Originally, the GraphConverter’s node features were static and pre-defined like this:

ball_node_features = [
            normalize_coords(ball.x1, pitch_dimensions.x_dim.max), #normalize x coordinate
            normalize_coords(ball.y1, pitch_dimensions.y_dim.max), #normalize y coordinate
            .
            .
            .
        ]
player_node_features = [
            (
                0.0
                if np.isnan(p.x1)
                else normalize_coords(p.x1, pitch_dimensions.x_dim.max) #normalize x coordinate
            ),
            (
                0.0
                if np.isnan(p.x1)
                else normalize_coords(p.y1, pitch_dimensions.y_dim.max) #normalize y coordinate
            ),
            .
            .
            .
        ]

With this update, users can dynamically select features when instantiating the GraphConverter(). In the future, we also would like to implement a add_my_custom_feature() functionality

converter = GraphConverter()
     .node_features
        .add_x(normed=True) #normalize x coordinate
        .add_y(normed=True) #normalize y coordinate
        .add_my_custom_feature()

To facilitate this functionality, we introduced a NodeFeatureSet class. This class enables users to dynamically add node features to the feature set and customize how it is calculated. The feature set is then populated with a feature function which includes the name, the function and the parameters required in the function. The code is present in unravel/utils/features/node_feature_set.py

    def add_x(self, normed: bool = True):
        if normed:
            self.node_feature_functions.append(
                ("normalize_x", normalize_coords, ["x", "max_x"])
            )
        else:
            self.node_feature_functions.append(("coord_x", lambda x: x, ["x"]))

        return self

The function defined above in the NodeFeatureSet allows the user to add the x-coordinate of the ball and player to node features. If normed=True, it normalized the x-coordinate. This dynamic configuration allows users to define the specific functions employed to calculate node features at the time of instantiating the GraphConverter. Similarly all the node feature calculations were captured using:

Function name | Function logic -- | -- add_x | if normed, stores normalize_coords()function (from utils.py) and passes x and max_x Else, stores a lambda identity function lambda x: x to relay x |   add_y | Similar to add_x with y and max_y add_velocity | Using unit_vector(), it calculates the velocity in x and y directions. If both the directions are included, it calculates the angle of velocity which can be normalized using normalize_angles() add_speed | If normed, stores normalize_speed() function and passes speed and max_speed, else stores the identity function add_goal_distance | . add_goal_angle | . add_ball_distance | . add_ball_angle | . add_team | . add_potential_reciever | .

Export settings feature

Introduced a function export_settings() in GraphConverter class which stores the version of unravel, node and edge features used along with graph settings into a json file settings.json in the root

    def export_settings(self) -> None:
        file_path = 'settings.json'
        data = {
            "__version__": "0.1.2",
            "node_features": [func_name for func_name,_,_ in self.node_features.get_features()],
            "edge_features": [func_name for func_name,_,_ in self.edge_features.get_features()],
            "graph_settings": self.settings.to_dict()
            }

        with open(file_path, 'w') as json_file:
            json.dump(data, json_file, indent=4)

        return

Also created a function to_dict() in graph_settings.py to serialize the object into a JSON subscriptable format. The following is a sample JSON file that was exported:

{
    "__version__": "0.1.2",
    "node_features": [
        "normalize_x",
        "normalize_y",
        "unit_velocity_x",
        "unit_velocity_y",
        "normalized_velocity_angle",
        "normalized_speed",
        "normalized_goal_distance",
        "normed_goal_angle",
        "normalized_ball_distance",
        "normed_ball_angle",
        "team",
        "potential_reciever"
    ],
    "edge_features": [
        "normalize_dist",
        "normalize_speed_diff",
        "normalise_cos_pos",
        "normalise_sin_pos",
        "normalise_cos_vel",
        "normalise_sin_vel"
    ],
    "graph_settings": {
        "infer_ball_ownership": true,
        "infer_goalkeepers": true,
        "ball_carrier_treshold": 25.0,
        "max_player_speed": 12.0,
        "max_ball_speed": 28.0,
        "boundary_correction": null,
        "self_loop_ball": false,
        "adjacency_matrix_connect_type": "ball",
        "adjacency_matrix_type": "split_by_team",
        "label_type": "binary",
        "defending_team_node_value": 0.1,
        "non_potential_receiver_node_value": 0.1,
        "random_seed": false,
        "pad": false,
        "verbose": false,
        "pitch_dimensions": {
            "pitch_length": 105,
            "pitch_width": 68,
            "max_x": 52.5,
            "min_x": -52.5,
            "max_y": 34.0,
            "min_y": -34.0
        },
        "pad_settings": null
    }
}

Verifying correctness

I conducted a sanity check by utilizing the dynamic feature selection introduced in the recent update to reproduce the output of the previous static configuration.

This means that this update has ensured correctness of output

MihirT906 commented 1 month ago

Update: Added test_flex_spektral and test_flex_kloppy to validate the functionality