JaseZiv / worldfootballR

A wrapper for extracting world football (soccer) data from FBref, Transfermark, Understat
https://jaseziv.github.io/worldfootballR/
470 stars 60 forks source link

`fb_season_team_stats()` returning error when `stat_type = "league_table"` #389

Closed Paulj1989 closed 3 months ago

Paulj1989 commented 3 months ago

I have just started receiving the following error when trying to get league table data from FBref using fb_season_team_stats():


league_table_raw <- 
  worldfootballR::fb_season_team_stats(
    country = c("ENG", "ESP", "GER", "ITA", "FRA"),
    gender = "M",
    season_end_year = c(2013:2024),
    tier = "1st",
    stat_type = "league_table",
    time_pause = 5
  )
#> Warning in min(league_tables_idx): no non-missing arguments to min; returning
#> Inf
#> Error in `map()`:
#> ℹ In index: 1.
#> Caused by error in `.f()`:
#> ! object 'stat_df' not found

This is code that I have previously run multiple times, without issues, though I probably haven't run this code for a few weeks (it's for a blog post I haven't had a chance to work on recently, until today).

I tested a few different values for season_end_year, including single values and ranges of values, to ensure it isn't specific to certain seasons, but all efforts produce the same error.

packageVersion("worldfootballR")
#> [1] '0.6.5.6'
tonyelhabr commented 3 months ago

Looks like fbref has started to list the league name at the top of the table. It used to say "League Table" for completed seasons, which is what our logic looks for.

image

This image is from WebArchive.

image

I'll submit a fix to look for the competition name. I suppose we could also rely on the league table being the first table loaded on a competition page (so we wouldn't have to search for a specific term), but I'm not sure if that's always true (although I think it probably is).

Paulj1989 commented 3 months ago

Thanks for getting to this so quickly @tonyelhabr!

Paulj1989 commented 3 months ago

Oh, I've just tested the code after updating the package to 0.6.5.7, and I'm still hitting the same error.

tonyelhabr commented 3 months ago

Oh, I've just tested the code after updating the package to 0.6.5.7, and I'm still hitting the same error.

You are correct (unfortunately 😄). Debugging the issue like this

library(worldfootballR)
library(purrr)
library(tidyr)

countries <- c("ENG", "ESP", "GER", "ITA", "FRA")
seasons <-  c(2013:2024)
params <- tidyr::expand_grid(
  country = countries,
  season = seasons
) |> 
  as.list()

possibly_fb_season_team_stats <- purrr::possibly(
  fb_season_team_stats,
  otherwise = NULL, 
  quiet = FALSE
)

league_table_raw <- purrr::map2(
  params$country,
  params$season,
  \(.country, .season) {
    message(sprintf("Scraping %s - %s.", .country, .season))
    possibly_fb_season_team_stats(
      country = .country,
      gender = "M",
      season_end_year = .season,
      tier = "1st",
      stat_type = "league_table",
      time_pause = 5
    )
  }
)

I found that Bundesliga was the source of the problem

Scraping ENG - 2013.
...
Scraping ESP - 2024.
Scraping GER - 2013.
Warning in min(league_tables_idx) :
  no non-missing arguments to min; returning Inf
Error: ℹ In index: 1.
Caused by error in `nrow()`:
! object 'stat_df' not found
Scraping GER - 2014.
Warning in min(league_tables_idx) :
  no non-missing arguments to min; returning Inf
Error: ℹ In index: 1.
Caused by error in `nrow()`:
! object 'stat_df' not found
...
Scraping ITA - 2013.
...
Scraping FRA - 2024.

The issue is that our all_compeititions.csv has Fußball-Bundesliga has the competition_name for the Bundesliga, while the table on the page that the code looks for shows "Bundesliga", so our string matching logic does not work. I think I will change the logic to search for just the first table on the page, since I'm fairly sure all league pages have the standings table as the first table on the page. This means that we also don't need to check for "Regular Season" for leagues like the MLS which have multiple conferences.