cj2001 / neo4j-gds-book

0 stars 0 forks source link

Sports graph #3

Open tomasonjo opened 3 years ago

tomasonjo commented 3 years ago

This is actually a proposal from lynxkite, which are our competitors, so we can't use directly the same problem, but just something to think about:

●Download and understand the soccer event data (https://figshare.com/collections/Soccer_match_event_dataset/4415000) ●Create a player based pass network (simply connecting A player to B player whenever A passed to B) for a selected match with a clear winner ●Visualize the pass graphs! Play around with various visualization options to make things look good/insightful (Ideas include: drop edges representing rare passing pairs, add player names as labels, use color to differentiate teams, position players based on their average position on the field, ...) ●Observe and note down some qualitative differences between the structure of the graphs of the two teams ●Compute a few quantitative graph metrics for both teams (e.g. diameter, position assortativity, variance of some centrality metrics). Do you think any of these would indicate a better team? ●Create an area based pass network for the same match for both teams. (Divide the field into a grid, connect two squares of the grid when there is a pass from one square moving the ball to the other.) Visualize and compute the same metrics as above!●Which of the above two graphs seem to set the two teams more apart? ●Generate python code for the computation of the above metrics for a given match. Using that, calculate them for all matches into a table. Run a logistic regression trying to predict the winning team based on these.

cj2001 commented 3 years ago

Mark had a huge data set of soccer stats a while back. So maybe we want to use his data set but update it? Or I think I might have seen a huge Kaggle data set on this.