JaseZiv / worldfootballR

A wrapper for extracting world football (soccer) data from FBref, Transfermark, Understat
https://jaseziv.github.io/worldfootballR/
433 stars 59 forks source link

I found bug in this function <fb_league_stats> #318

Closed Gyracat closed 4 months ago

Gyracat commented 10 months ago

I found bug in this function

Example Code test <-fb_league_stats( country ="ENG", gender ="M", season_end_year = 2023, tier = "1st", stat_type = "passing_types", team_or_player ="player", time_pause = 3, rate = purrr::rate_backoff(max_times = 3) )

Screenshot 2566-08-22 at 22 58 28

tonyelhabr commented 10 months ago

I don't think this is really a bug. This function is "experimental" and can be highly dependent on

  1. your internet connection
  2. FBRef's server latency

On the backend, we use chromote and have to wait for the table to be loaded via JavaScript on the page. this is the only function like this in the package AFAIK.

tonyelhabr commented 10 months ago

for transparency, i basically copied over a lot of the rvest / promise for fb_league_stats() (see worldfootballr_chromote_session()) from this branch in the {rvest} package. that branch has been in draft state for months, seemingly because it's never completely worked properly

SeriyBg commented 7 months ago

I can also reproduce it today, although yesterday, there was not issue. Strangely, the function works for team but doesn't work for player. Even with the stat_type such as keepers, where the tables are of comparable sizes.

For team:

fb_league_stats(country = "ENG", gender = "M", season_end_year = 2024, tier = "1st", non_dom_league_url = NA,
  stat_type = "keepers",
  team_or_player = "team"
)
# A tibble: 20 × 23
   Team_or_Opponent Squad  Num_Players `MP_Playing Time` `Starts_Playing Time` `Min_Playing Time` Mins_Per_90_Playing …¹
   <chr>            <chr>        <int>             <int>                 <int>              <dbl>                  <dbl>
 1 team             Arsen…           2                15                    15               1350                     15
 2 team             Aston…           2                15                    15               1350                     15
........................

For player:

fb_league_stats(country = "ENG", gender = "M", season_end_year = 2024, tier = "1st", non_dom_league_url = NA,
  stat_type = "keepers",
  team_or_player = "player"
)
Error: Request failed after 3 attempts.
# A tibble: 0 × 1
# ℹ 1 variable: url <chr>
tonyelhabr commented 7 months ago

I've intentionally left this issue open to give notice that we're aware that this function is unreliable. I'm not sure there's a great solution without using Selenium, which we've implicitly decided to not use, so as to reduce the scope of this package.

I think the reason why the function sometimes works but sometimes doesn't is due to server latency on FBref's end, but I'm not entirely sure. The function relies on {promises} for async behavior, which can be very dependent on Internet connection and server response time.

SeriyBg commented 7 months ago

Thanks for the answer @tonyelhabr! One more note, that may help in solving this issue. I found a workaround by using fb_big5_advanced_season_stats(season_end_year= c(2024), stat_type= "shooting", team_or_player= "player") instead and filtering by the league name after. The fact that the fb_big5_advanced_season_stats is working while fb_league_stats isn't might indicate that this is not the server latency issue. However, this is only my guess 😃

tonyelhabr commented 7 months ago

Thanks for the answer @tonyelhabr! One more note, that may help in solving this issue. I found a workaround by using fb_big5_advanced_season_stats(season_end_year= c(2024), stat_type= "shooting", team_or_player= "player") instead and filtering by the league name after. The fact that the fb_big5_advanced_season_stats is working while fb_league_stats isn't might indicate that this is not the server latency issue. However, this is only my guess 😃

i'm glad that works for you! that outcome is not unexpected to me. fb_big5_advanced_season_stats() doesn't require any special backend logic (i.e. promises) because FBref loads that data on the server-side. The individual league player stats are loaded on the client side (in the browser), so they can't be scraped in the typical manner (i.e. with rvest::read_html()).

tonyelhabr commented 4 months ago

Issue is being archived.