JaseZiv / worldfootballR

A wrapper for extracting world football (soccer) data from FBref, Transfermark, Understat
https://jaseziv.github.io/worldfootballR/
444 stars 60 forks source link

MLS match_round is null with fotmob_get_match_details #103

Closed mvantschip closed 2 years ago

mvantschip commented 2 years ago

I am trying to scrape all the shots from the current MLS season with the following code:

league_matches <- fotmob_get_league_matches(league_id=130, cached=TRUE)
league_matches <- league_matches %>%
  dplyr::select(match_id = id, home, away) %>%
  tidyr::unnest_wider(c(home, away), names_sep = "_")

match_details <- fotmob_get_match_details(league_matches$match_id)

This does return a lot of information, but the match_round column is empty and the league_round_name column results in "Major League Soccer null" for every row.

I am running worldfootballR 0.5.0.

This is my sessionInfo():

R version 4.1.2 (2021-11-01)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 22000)

Matrix products: default

locale:
[1] LC_COLLATE=Dutch_Netherlands.1252  LC_CTYPE=Dutch_Netherlands.1252   
[3] LC_MONETARY=Dutch_Netherlands.1252 LC_NUMERIC=C                      
[5] LC_TIME=Dutch_Netherlands.1252    

attached base packages:
[1] stats     graphics  grDevices datasets  utils     methods   base     

other attached packages:
 [1] worldfootballR_0.5.0 forcats_0.5.1        stringr_1.4.0        dplyr_1.0.8         
 [5] purrr_0.3.4          tidyr_1.2.0          tibble_3.1.6         ggplot2_3.3.5       
 [9] tidyverse_1.3.1      readr_2.1.2         

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.8        lubridate_1.8.0   prettyunits_1.1.1 ps_1.6.0          rprojroot_2.0.2  
 [6] assertthat_0.2.1  utf8_1.2.2        R6_2.5.1          cellranger_1.1.0  backports_1.4.1  
[11] reprex_2.0.1      httr_1.4.2        pillar_1.7.0      rlang_1.0.1       curl_4.3.2       
[16] readxl_1.3.1      rstudioapi_0.13   callr_3.7.0       desc_1.4.0        devtools_2.4.3   
[21] munsell_0.5.0     broom_0.7.12      compiler_4.1.2    modelr_0.1.8      janitor_2.1.0    
[26] pkgconfig_2.0.3   pkgbuild_1.3.1    tidyselect_1.1.2  fansi_1.0.2       crayon_1.5.0     
[31] tzdb_0.2.0        dbplyr_2.1.1      withr_2.5.0       brio_1.1.3        grid_4.1.2       
[36] jsonlite_1.7.3    gtable_0.3.0      lifecycle_1.0.1   DBI_1.1.2         magrittr_2.0.2   
[41] scales_1.1.1      cli_3.2.0         stringi_1.7.6     cachem_1.0.6      renv_0.15.2      
[46] remotes_2.4.2     fs_1.5.2          testthat_3.1.2    snakecase_0.11.0  xml2_1.3.3       
[51] ellipsis_0.3.2    generics_0.1.2    vctrs_0.3.8       tools_4.1.2       glue_1.6.1       
[56] hms_1.1.1         pkgload_1.2.4     processx_3.5.2    fastmap_1.1.0     colorspace_2.0-2 
[61] sessioninfo_1.2.2 rvest_1.0.2       memoise_2.0.1     haven_2.4.3       usethis_2.1.5 

Thank you for the amazing work! 👍

JaseZiv commented 2 years ago

Hi there,

Sorry for the late response!

Will look to address this issue once #104 is resolved.

Thanks

tonyelhabr commented 2 years ago

This seems to be an issue with Fotmob data itself that I'm not really sure how to address. match_round generally reflects the match week, e.g. matches on the last day of the 2020/21 EPL season have match_round=38.

results <- fotmob_get_matches_by_date(date = "20210523")
match_ids <- results %>%
  dplyr::select(primary_id, ccode, league_name = name, matches) %>%
  dplyr::filter(league_name == "Premier League", ccode == "ENG") %>%
  tidyr::unnest_longer(matches) %>%
  dplyr::pull(matches) %>%
  dplyr::pull(id)
fotmob_get_match_details(match_ids) %>% dplyr::select(match_id, match_round)
# A tibble: 10 x 2
   match_id match_round
   <chr>    <chr>      
 1 3411719  38         
 2 3411720  38         
 3 3411721  38         
 4 3411722  38         
 5 3411723  38         
 6 3411724  38         
 7 3411725  38         
 8 3411726  38         
 9 3411727  38         
10 3411728  38

If you really want a column to indicate match week (i.e. match_round), you can do something like this to estimate it.

library(dplyr)
side1 <- match_details %>% select(match_id, team_id = home_team_id, opponent_id = away_team_id)
side2 <- match_details %>% select(match_id, team_id = away_team_id, opponent_id = home_team_id)
match_rounds <- bind_rows(
  side1,
  side2
) %>% 
  inner_join(match_details %>% select(match_id, match_time_utc), by = "match_id") %>% 
  mutate(
    across(
      match_time_utc,
      lubridate::mdy_hm
    )
  ) %>% 
  arrange(match_time_utc, match_id, team_id) %>% 
  group_by(team_id) %>% 
  mutate(
    match_round1 = row_number(match_time_utc)
  ) %>% 
  ungroup() %>% 
  group_by(opponent_id) %>% 
  mutate(
    match_round2 = row_number(match_time_utc)
  ) %>% 
  ungroup() %>% 
  transmute(
    match_id,
    match_round = ifelse(match_round1 < match_round2, match_round1, match_round2),
    team_id
  )

match_details %>% 
  select(-match_round) %>% 
  inner_join(match_rounds, by = c("match_id", "home_team_id" = "team_id")) %>% 
  relocate(match_round, .after = "match_id")
# A tibble: 73 x 15
   match_id match_round league_id league_name   league_round_name  parent_league_id
   <chr>          <int>     <int> <chr>         <chr>                         <int>
 1 3787303            1       130 Major League~ Major League Socc~              130
 2 3787426            1       130 Major League~ Major League Socc~              130
 3 3787290            1  10000002 Major League~ Major League Socc~              130
 4 3787425            1       130 Major League~ Major League Socc~              130
 5 3787289            1       130 Major League~ Major League Socc~              130
 6 3787288            1  10000001 Major League~ Major League Socc~              130
 7 3787287            1  10000001 Major League~ Major League Socc~              130
 8 3787286            1       130 Major League~ Major League Socc~              130
 9 3787285            1       130 Major League~ Major League Socc~              130
10 3787424            1  10000001 Major League~ Major League Socc~              130
# ... with 63 more rows, and 9 more variables: parent_league_season <chr>,
#   match_time_utc <chr>, home_team_id <int>, home_team <chr>,
#   home_team_color <chr>, away_team_id <int>, away_team <chr>,
#   away_team_color <chr>, shots <list>
mvantschip commented 2 years ago

Thank you for the reply. I guess if it is not possible in general, this is indeed a very solid workaround.

JaseZiv commented 2 years ago

Thanks for your help @tonyelhabr.

I will mark this closed for now.

Reach out if you need a hand with anything else @mvantschip