American-Soccer-Analysis / asa-shiny-app

American Soccer Analysis interactive application, built with Shiny.
https://app.americansocceranalysis.com
21 stars 4 forks source link

Player/Game Data #130

Closed travisendicott closed 1 year ago

travisendicott commented 1 year ago

Hey, I'm new to soccer data analysis but is there anyway that you could post/include player data from individual games? I'm trying to see which USL Championship players are taking higher xG shots and which are taking (potentially) more lower xG shots. Additionally, I would like to see which teams are giving up better (higher xG) shots and against which players. In my mind at least, I think I can do this with player/game type data. I understand that I can take the xGF/xGA divided by the total number of shots taken/conceded, but I'd like to do a bit more with the game stats. Any help you could provide would be greatly appreciated!

mattyanselmo commented 1 year ago

You can use the date filter on the xG app page to go week by week and get player data (or same on the team xG page). Admittedly that is a somewhat tedious task. If you use our "itscalledsoccer" packages in R or Python, you can programmatically loop through dates and make those pulls. The get_player_xgoals functions have start_date and end_date inputs that could be part of a loop.

See: date filter image

travisendicott commented 1 year ago

Thank you that worked! Is there a reason why the goals added variable is a list and nested within another variable? I'm having a hard time getting that information out so I can play with that data. Do you all have some code to unnest or is there a problem in the way that I'm exporting the data?

mattyanselmo commented 1 year ago

Assuming you're working in R? Here's some code I used recently...

schema <- "mls"
# Get field player g+ ####
gplus <- asa_client$get_player_goals_added(
  leagues = glue("{schema}"),
  season_name = seasons,
  split_by_seasons = TRUE,
  stage_name = c("Regular Season", "NWSL Challenge Cup Group Stage", "MLS is Back Group Stage")
)

gplus_data <- lapply(gplus$data,
                     function(temp){
                       temp %>% 
                         select(-c(count_actions, goals_added_raw)) %>% 
                         pivot_wider(names_from = action_type, values_from = c(goals_added_above_avg))
                     }) %>%
  bind_rows()

gplus <- gplus %>%
  select(-data) %>%
  bind_cols(gplus_data)
travisendicott commented 1 year ago

I am working in R, sorry for not saying that upfront. Thank you for the code, I think that works but I'm still getting an error because there is a null value in one of the lists. I have tried some code to turn it into NA but I'm having trouble since it's a list. Have you encountered this before and if so how did you fix it?

Sorry for all the questions, I'm new to R and soccer data analysis. Most of the data analytics that I've done has been in Stata before and I'm transitioning to R since it's a free resource. But there have been a lot of syntax issues that I've run into that I can normally figure out by searching on Stack Overflow or something similar. Unfortunately, I haven't been as lucky with this variable list issue with the G+ data.

mattyanselmo commented 1 year ago

No problem! I think it would help if you shared your code so that I can replicate the error. I can't remember having this issue off the top of my head.

travisendicott commented 1 year ago

Absolutely! Hopefully this copies in correctly. I'm starting out by looking at Monterey Bay FC in the USL Championship and then I'll move onto other teams. But I wanted to get data for the first week first.

### Load Libraries
library(ggplot2)
library(dplyr)
library(tidyverse)
library(itscalledsoccer)
library(data.table)
library(lubridate)
### Get USL-Championship Game Data
knitr::opts_chunk$set(echo = FALSE)

asa_data <- AmericanSoccerAnalysis$new()
uslc_names <- asa_data$get_players(
  leagues = "uslc")
uslc_teams <- asa_data$get_teams(leagues = "uslc")
uslc_managers <- asa_data$get_managers(leagues = "uslc")
uslc_goalies <- asa_data$get_goalkeeper_xgoals(leagues = "uslc")
uslc_gamesh <- asa_data$get_games(leagues = "uslc")
uslc_gamesa <- asa_data$get_games(leagues = "uslc")
uslc_games1 <- merge(uslc_teams, uslc_gamesh, by.x=c("team_id"), by.y=c("home_team_id"))
uslc_games1 <- merge(uslc_games1, uslc_managers, by.x=c("home_manager_id"), by.y=c("manager_id"))
setnames(uslc_games1, new = c('home_manager_name', 'home_manager_nationality'),
         old = c('manager_name', 'nationality'))
uslc_games1 <- uslc_games1[, c(2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 1, 24, 25)]
uslc_games2 <- merge(uslc_teams, uslc_gamesa, by.x=c("team_id"), by.y=c("away_team_id"))
uslc_games2 <- merge(uslc_games2, uslc_managers, by.x=c("away_manager_id"), by.y=c("manager_id"))
setnames(uslc_games2, new = c('away_manager_name', 'away_manager_nationality'),
         old = c('manager_name', 'nationality'))
uslc_games2 <- uslc_games2[, c(2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 1, 24, 25)]
setnames(uslc_games1, new = c('home_team_name', 'home_team_short_name', 'home_team_abb'),
         old = c('team_name', 'team_short_name', 'team_abbreviation'))
setnames(uslc_games2, new = c('away_team_name', 'away_team_short_name', 'away_team_abb'),
         old = c('team_name', 'team_short_name', 'team_abbreviation'))
home_teams <- subset(uslc_games1, select = c("game_id", "home_team_name", "home_team_short_name", "home_team_abb", "home_manager_id", "home_manager_name", "home_manager_nationality"))
uslc_games = left_join(uslc_games2, home_teams, by = 'game_id')
names(uslc_games)[1] <- 'away_team_id'
uslc_games$date <- date(uslc_games$date_time_utc)
uslc_games <- uslc_games[, c(5, 32, 14, 15, 16, 17, 7, 8, 13, 19, 20, 21, 22, 9, 26, 27, 28, 29, 30, 31, 1, 2, 3, 4, 23, 24, 25, 10, 11, 6)]
names(uslc_games)[18] <- 'home_manager_id'
rm(uslc_games1, uslc_games2, uslc_gamesa, uslc_gamesh)
### Game 1 - Monterey Bay FC
uslc_g1c <- asa_data$get_player_xgoals(
    leagues = "uslc",
    start_date = "2022-03-11",
    end_date = "2022-03-13"
)

uslc_ga1 <- asa_data$get_player_goals_added(
    leagues = "uslc",
    start_date = "2022-03-11",
    end_date = "2022-03-13"
)

uslc_xp1 <- asa_data$get_player_xpass(
    leagues = "uslc",
    start_date = "2022-03-11",
    end_date = "2022-03-13"
)

uslc_g1t = left_join(uslc_names, uslc_mbfc_g1c, by = 'player_id')
uslc_g1 = left_join(uslc_teams, uslc_mbfc_g1t, by = "team_id")
rm(uslc_g1c, uslc_g1t)
mbfc_gm1 <- subset(uslc_g1, team_name=="Monterey Bay FC")
mbfc_gm1$date <- c("2022-03-13")
mbfc_gm1$date <- date(mbfc_gm1$date)
mbfc_gm1 = right_join(uslc_games, mbfc_gm1, by = 'date')
mbfc_gm1 <- subset(mbfc_gm1, away_team_short_name=="Monterey Bay")

uslc_ga1 = left_join(uslc_names, uslc_ga1, by = 'player_id')
uslc_ga1 = left_join(uslc_teams, uslc_ga1, by = 'team_id')
mbfc_ga1 <- subset(uslc_ga1, team_name=="Monterey Bay FC")
mbfc_ga1$date <- c("2022-03-13")
mbfc_ga1$date <- date(mbfc_ga1$date)
mbfc_ga1 = right_join(uslc_games, mbfc_ga1, by = 'date')
mbfc_ga1 <- subset(mbfc_ga1, away_team_short_name=="Monterey Bay")
mbfc_gm1 <- left_join(mbfc_gm1, mbfc_ga1)

gplus_data <- lapply(mbfc_gm1$data,
                     function(temp){
                       temp %>% 
                         select(-c(count_actions, goals_added_raw)) %>% 
                         pivot_wider(names_from = action_type, values_from = c(goals_added_above_avg))
                     }) %>%
  bind_rows()
mattyanselmo commented 1 year ago

I'm hitting on error on what ends up being line 58 uslc_g1t = left_join(uslc_names, uslc_mbfc_g1c, by = 'player_id'). I don't see the object uslc_mbfc_g1c.

Generally, if you're trying to get game-by-game data for players and teams, here's an example snippet. It's a bit ugly because I didn't have the package documentation handy, but I hacked it together. Hopefully it's helpful!

library(ggplot2)
library(dplyr)
library(tidyverse)
library(itscalledsoccer)
library(data.table)
library(lubridate)
### Get USL-Championship Game Data
knitr::opts_chunk$set(echo = FALSE)

## SELECT TEAM ####
team_input <- "MB"

asa_data <- AmericanSoccerAnalysis$new()
uslc_names <- asa_data$get_players(
  leagues = "uslc")
uslc_teams <- asa_data$get_teams(leagues = "uslc")
team_id_input <- uslc_teams$team_id[uslc_teams$team_abbreviation == team_input]
uslc_managers <- asa_data$get_managers(leagues = "uslc")
uslc_goalies <- asa_data$get_goalkeeper_xgoals(leagues = "uslc")
uslc_games <- asa_data$get_games(leagues = "uslc") %>%
  mutate(date = as.Date(date_time_utc))

# Get Monterrey games
selected_team_games <- uslc_games %>%
  left_join(uslc_teams %>% select(home_team_id = team_id, home_team = team_abbreviation),
            by = "home_team_id") %>%
  left_join(uslc_teams %>% select(away_team_id = team_id, away_team = team_abbreviation),
            by = "away_team_id") %>%
  filter(home_team == team_input | away_team == team_input)

# Get team xg results by game
team_xg_by_game <- asa_data$get_game_xgoals(leagues = "uslc")
selected_team_xg_by_game <- team_xg_by_game %>%
  filter(game_id %in% selected_team_games$game_id)

mb_player_xgoals <- data.frame()
for(i in 1:nrow(selected_team_games)){
  mb_player_xgoals <- bind_rows(mb_player_xgoals,
                                asa_data$get_player_xgoals(
                                  leagues = "uslc",
                                  start_date = selected_team_games$date[i],
                                  end_date = selected_team_games$date[i]
                                ) %>%
                                  filter(team_id == team_id_input) %>%
                                  mutate(game_id = selected_team_games$game_id[i],
                                         date = selected_team_games$date[i]))
}

mb_player_goals_added <- data.frame()
for(i in 1:nrow(selected_team_games)){
  mb_player_goals_added <- bind_rows(mb_player_goals_added,
                                     asa_data$get_player_goals_added(
                                       leagues = "uslc",
                                       start_date = selected_team_games$date[i],
                                       end_date = selected_team_games$date[i]
                                     ) %>%
                                       filter(team_id == team_id_input)  %>%
                                       mutate(game_id = selected_team_games$game_id[i],
                                              date = selected_team_games$date[i]))
}

mb_gplus_data <- lapply(mb_player_goals_added$data,
                     function(temp){
                       temp %>% 
                         select(-c(count_actions, goals_added_raw)) %>% 
                         pivot_wider(names_from = action_type, values_from = c(goals_added_above_avg))
                     }) %>%
  bind_rows()

mb_player_goals_added <- mb_player_goals_added %>%
  select(-data) %>%
  bind_cols(mb_gplus_data)
travisendicott commented 1 year ago

Thank you so much. This will do everything that I need! I can't express how much easier you made my hobby of analyzing these data.

travisendicott commented 1 year ago

I'm running into an issue trying to replicate this with the xpass data. My goal is to get the xpass data per game for MB. I have been unable to do this by game with code and I can do it while searching on the website while using the date range function. When I just to use the similar code for xg, I encounter errors saying that the "selected_team_games$date" isn't available. Any suggestions on how to force the search by date?

mattyanselmo commented 1 year ago

I think something like this is what you're looking for! This would get each game for all teams for the month of April, 2022.

df <- data.frame()
for(date in seq.Date(as.Date("2022-04-01"), as.Date("2022-04-30"), "days")){
    date <- as.Date(date, origin = "1970-01-01")
    df <- bind_rows(df,
                    asa_client$get_team_xpass("mls", start_date = date, end_date = date) %>% mutate(game_date = date))
}