JaseZiv / worldfootballR

A wrapper for extracting world football (soccer) data from FBref, Transfermark, Understat
https://jaseziv.github.io/worldfootballR/
444 stars 60 forks source link

Download (Scraping) Data connection problem #136

Closed mcefis closed 2 years ago

mcefis commented 2 years ago

Good morning,

I have a connection problem when I try to scrape some data from fbref and fotmob (while it's all ok scraping from understat); the output error message is:

Error in file(file, "rt") : it's not possible to open connection Warning: In file(file, "rt") : it's no possible to open URL 'https://raw.githubusercontent.com/JaseZiv/worldfootballR_data/master/raw-data/fotmob-leagues/season_ids.csv': HTTP '404 Not Found'

Can you help me, please?

Many thanks,

Mattia

JaseZiv commented 2 years ago

Hi,

I will need the code you used to get the error. Also, the output of sessionInfo().

Thanks

mcefis commented 2 years ago

Here you are:

My code: library(worldfootballR) library(dplyr) library(tidyr)

It does not work

scout <- fb_player_scouting_report(player_url = "https://fbref.com/en/players/d70ce98e/Lionel-Messi", pos_versus = "primary") %>% dplyr::filter(scouting_period == "Last 365 Days")

epl_team_xg_2021 <- fotmob_get_season_stats( country = "ENG", league_name = "Premier League", season_name = "2020/2021", stat_type = "xg", team_or_player = "team" )

league_matches <- fotmob_get_league_matches( country = c("ENG", "ESP" ), league_name = c("Premier League", "LaLiga") )

sessionInfo() output:

sessionInfo() R version 4.1.3 (2022-03-10) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 19043)

Matrix products: default

locale: [1] LC_COLLATE=Italian_Italy.1252 LC_CTYPE=Italian_Italy.1252 LC_MONETARY=Italian_Italy.1252 [4] LC_NUMERIC=C LC_TIME=Italian_Italy.1252

attached base packages: [1] stats graphics grDevices utils datasets methods base

other attached packages: [1] tidyr_1.2.0 dplyr_1.0.9 worldfootballR_0.5.1.1000

loaded via a namespace (and not attached): [1] rstudioapi_0.13 xml2_1.3.3 janitor_2.1.0 magrittr_2.0.3 hms_1.1.1
[6] tidyselect_1.1.2 rvest_1.0.2 R6_2.5.1 rlang_1.0.2 fansi_1.0.3
[11] httr_1.4.3 stringr_1.4.0 tools_4.1.3 utf8_1.2.2 cli_3.2.0
[16] selectr_0.4-2 ellipsis_0.3.2 tibble_3.1.7 lifecycle_1.0.1 crayon_1.5.1
[21] purrr_0.3.4 readr_2.1.2 tzdb_0.3.0 vctrs_0.4.1 curl_4.3.2
[26] glue_1.6.2 snakecase_0.11.0 stringi_1.7.6 compiler_4.1.3 pillar_1.7.0
[31] generics_0.1.2 jsonlite_1.8.0 lubridate_1.8.0 pkgconfig_2.0.3


Let me know, many thanks

JaseZiv commented 2 years ago

You will need to update to the most recent version, v0.5.6

mcefis commented 2 years ago

Ok thanks, the main problem now is solved, but there is still the following one:

epl_team_xg_2021 <- fotmob_get_season_stats( country = "ENG", league_name = "Premier League", season_name = "2020/2021", stat_name = "Expected goals", team_or_player = "team" )

The output:

Errore in open.connection(x, "rb") : Could not resolve host: www.fotmob.comNA

SessionInfo()

R version 4.1.3 (2022-03-10) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 19043)

Matrix products: default

locale: [1] LC_COLLATE=Italian_Italy.1252 LC_CTYPE=Italian_Italy.1252 LC_MONETARY=Italian_Italy.1252 [4] LC_NUMERIC=C LC_TIME=Italian_Italy.1252

attached base packages: [1] stats graphics grDevices utils datasets methods base

other attached packages: [1] tidyr_1.2.0 dplyr_1.0.9 worldfootballR_0.5.6

loaded via a namespace (and not attached): [1] pillar_1.7.0 compiler_4.1.3 prettyunits_1.1.1 remotes_2.4.2 tools_4.1.3
[6] testthat_3.1.3 pkgbuild_1.3.1 pkgload_1.2.4 jsonlite_1.8.0 lubridate_1.8.0
[11] memoise_2.0.1 lifecycle_1.0.1 tibble_3.1.7 pkgconfig_2.0.3 rlang_1.0.2
[16] cli_3.2.0 rstudioapi_0.13 curl_4.3.2 fastmap_1.1.0 xml2_1.3.3
[21] stringr_1.4.0 janitor_2.1.0 withr_2.5.0 httr_1.4.3 hms_1.1.1
[26] generics_0.1.2 desc_1.4.1 fs_1.5.2 vctrs_0.4.1 devtools_2.4.3
[31] rprojroot_2.0.3 tidyselect_1.1.2 snakecase_0.11.0 glue_1.6.2 R6_2.5.1
[36] processx_3.5.3 fansi_1.0.3 sessioninfo_1.2.2 selectr_0.4-2 tzdb_0.3.0
[41] readr_2.1.2 callr_3.7.0 purrr_0.3.4 magrittr_2.0.3 ps_1.6.0
[46] ellipsis_0.3.2 usethis_2.1.5 rvest_1.0.2 utf8_1.2.2 stringi_1.7.6
[51] cachem_1.0.6 crayon_1.5.1 brio_1.1.3

JaseZiv commented 2 years ago

This does appear to be an issue... i'm guessing a Fotmob URL endpoint has changed again @tonyelhabr?

tonyelhabr commented 2 years ago

hmm i see. this has to do with the default stats page for leagues being blank because it's between seasons and Fotmob has defaulted to the next season.

after

What it used to look like: before

the code depends on the "See All" button being on the page, so that it can navigate to a second page to scrape all stat options. there are multiple way to fix this. i'll have to think about which is best, and then i'll go ahead and submit a fix

JaseZiv commented 2 years ago

Awesome, thanks Tony!

mcefis commented 2 years ago

Many thanks for your kind answers!