JaseZiv / worldfootballR

A wrapper for extracting world football (soccer) data from FBref, Transfermark, Understat
https://jaseziv.github.io/worldfootballR/
433 stars 59 forks source link

fb_match_results returning NA for goals #326

Closed dorronsoro1 closed 8 months ago

dorronsoro1 commented 10 months ago

I just updated my worldfootballR to the latest version and when I use the fb_match_results function I get values for all columns including expected goals, but home goals and away goals are returning NA

library(worldfootballR)
packageVersion("worldfootballR") #‘0.6.4.4’

test.results <- fb_match_results(country = "ENG", gender = "M", season_end_year = 2023, tier = "1st")
head(test.results)

# Competition_Name Gender Country Season_End_Year Round Wk Day       Date  Time           Home HomeGoals
# 1   Premier League      M     ENG            2023    NA  1 Fri 2022-08-05 20:00 Crystal Palace        NA
# 2   Premier League      M     ENG            2023    NA  1 Sat 2022-08-06 12:30         Fulham        NA
# 3   Premier League      M     ENG            2023    NA  1 Sat 2022-08-06 15:00      Tottenham        NA
# 4   Premier League      M     ENG            2023    NA  1 Sat 2022-08-06 15:00  Newcastle Utd        NA
# 5   Premier League      M     ENG            2023    NA  1 Sat 2022-08-06 15:00   Leeds United        NA
# 6   Premier League      M     ENG            2023    NA  1 Sat 2022-08-06 15:00    Bournemouth        NA
# Home_xG            Away AwayGoals Away_xG Attendance                     Venue        Referee Notes
# 1     1.2         Arsenal        NA     1.0      25286             Selhurst Park Anthony Taylor    NA
# 2     1.2       Liverpool        NA     1.2      22207            Craven Cottage    Andy Madley    NA
# 3     1.5     Southampton        NA     0.5      61732 Tottenham Hotspur Stadium Andre Marriner    NA
# 4     1.7 Nott'ham Forest        NA     0.3      52245            St James' Park   Simon Hooper    NA
# 5     0.8          Wolves        NA     1.3      36347               Elland Road   Robert Jones    NA
# 6     0.6     Aston Villa        NA     0.7      11013          Vitality Stadium   Peter Bankes    NA
# MatchURL
# 1               https://fbref.com/en/matches/e62f6e78/Crystal-Palace-Arsenal-August-5-2022-Premier-League
# 2                     https://fbref.com/en/matches/6713c1dc/Fulham-Liverpool-August-6-2022-Premier-League
# 3        https://fbref.com/en/matches/09d8a999/Tottenham-Hotspur-Southampton-August-6-2022-Premier-League
# 4   https://fbref.com/en/matches/1ac96eb4/Newcastle-United-Nottingham-Forest-August-6-2022-Premier-League
# 5 https://fbref.com/en/matches/82702941/Leeds-United-Wolverhampton-Wanderers-August-6-2022-Premier-League
# 6              https://fbref.com/en/matches/877e3193/Bournemouth-Aston-Villa-August-6-2022-Premier-League
JaseZiv commented 10 months ago

Hmm this is weird - I'm not seeing this when I run the function.

I will leave this issue open for now in case anyone else experiences this. In the meantime, you can always use load_match_results():

loaded_results <- worldfootballR::load_match_results(country = "ENG", gender = "M", season_end_year = 2023, tier = "1st")
fine-lemur commented 9 months ago

i'me seeing same error with the code fragment above, same version of the package:

packageVersion("worldfootballR") #‘0.6.4.4’

For me this started in the last couple of days, same code working fine before then

fine-lemur commented 9 months ago

as I understand it from the docs worldfootballR::load_match_results pulls from a cached version of the data off your GitHub? How often is that updated? I see the English National League results from Tuesday 26th September 2023 on the fibref site, but when I pull that league via the above method, they're still missing as of today (a couple of days later)

worldfootballR::load_match_results(country = "ENG", tier = "5th", gender = "M",2024)

JaseZiv commented 9 months ago

Interesting... I still can't recreate this issue.. @tonyelhabr, are you able to return correct results as expected also?

In regards to your question @fine-lemur, the match results are updated based on the following CRON schedule (UTC):

on:
  schedule:
    - cron: "15 17 * 1-5,8-12 0,2,4"

So Sundays, Tuesdays and Thursdays.

dorronsoro1 commented 9 months ago

I am using a Mac, I wonder if that is causing the issue? Sometimes I get file encoding issues that I have to manually address in the code.

dorronsoro1 commented 9 months ago

Closed by mistake

tonyelhabr commented 9 months ago

@dorronsoro1 i know you have a pretty recent version of the package, but can you re-install with the latest version (i.e. using remotes::install_github("JaseZiv/worldfootballR")) and try again? like @JaseZiv, i don't have any issue with NAs for the goals fields

library(worldfootballR)
packageVersion("worldfootballR")
#> [1] '0.6.4.8'

results <- fb_match_results(country = "ENG", gender = "M", season_end_year = 2023, tier = "1st")
dplyr::glimpse(results)
#> Rows: 380
#> Columns: 20
#> $ Competition_Name <chr> "Premier League", "Premier League", "Premier League",…
#> $ Gender           <chr> "M", "M", "M", "M", "M", "M", "M", "M", "M", "M", "M"…
#> $ Country          <chr> "ENG", "ENG", "ENG", "ENG", "ENG", "ENG", "ENG", "ENG…
#> $ Season_End_Year  <int> 2023, 2023, 2023, 2023, 2023, 2023, 2023, 2023, 2023,…
#> $ Round            <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
#> $ Wk               <chr> "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "10…
#> $ Day              <chr> "Fri", "Sat", "Sat", "Sat", "Sat", "Sat", "Sat", "Sun…
#> $ Date             <date> 2022-08-05, 2022-08-06, 2022-08-06, 2022-08-06, 2022…
#> $ Time             <chr> "20:00", "12:30", "15:00", "15:00", "15:00", "15:00",…
#> $ Home             <chr> "Crystal Palace", "Fulham", "Tottenham", "Newcastle U…
#> $ HomeGoals        <dbl> 0, 2, 4, 2, 2, 2, 0, 2, 1, 0, 2, 5, 4, 3, 0, 3, 2, 3,…
#> $ Home_xG          <dbl> 1.2, 1.2, 1.5, 1.7, 0.8, 0.6, 0.7, 0.6, 1.4, 0.5, 1.1…
#> $ Away             <chr> "Arsenal", "Liverpool", "Southampton", "Nott'ham Fore…
#> $ AwayGoals        <dbl> 2, 2, 1, 0, 1, 0, 1, 2, 2, 2, 1, 1, 0, 0, 1, 1, 1, 2,…
#> $ Away_xG          <dbl> 1.0, 1.2, 0.5, 0.3, 1.3, 0.7, 1.5, 0.8, 1.5, 2.2, 1.0…
#> $ Attendance       <dbl> 25286, 22207, 61732, 52245, 36347, 11013, 39254, 3179…
#> $ Venue            <chr> "Selhurst Park", "Craven Cottage", "Tottenham Hotspur…
#> $ Referee          <chr> "Anthony Taylor", "Andy Madley", "Andre Marriner", "S…
#> $ Notes            <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
#> $ MatchURL         <chr> "https://fbref.com/en/matches/e62f6e78/Crystal-Palace…
JaseZiv commented 9 months ago

I am using a Mac, I wonder if that is causing the issue? Sometimes I get file encoding issues that I have to manually address in the code.

I'm also using a Mac

fine-lemur commented 8 months ago

I'm on a Mac also -- Sonora on M1 Pro

dorronsoro1 commented 8 months ago
results <- fb_match_results(country = "ENG", gender = "M", season_end_year = 2023, tier = "1st")
dplyr::glimpse(results)
dorronsoro1 commented 8 months ago

@dorronsoro1 i know you have a pretty recent version of the package, but can you re-install with the latest version (i.e. using remotes::install_github("JaseZiv/worldfootballR")) and try again? like @JaseZiv, i don't have any issue with NAs for the goals fields

library(worldfootballR)
packageVersion("worldfootballR")
#> [1] '0.6.4.8'

results <- fb_match_results(country = "ENG", gender = "M", season_end_year = 2023, tier = "1st")
dplyr::glimpse(results)
#> Rows: 380
#> Columns: 20
#> $ Competition_Name <chr> "Premier League", "Premier League", "Premier League",…
#> $ Gender           <chr> "M", "M", "M", "M", "M", "M", "M", "M", "M", "M", "M"…
#> $ Country          <chr> "ENG", "ENG", "ENG", "ENG", "ENG", "ENG", "ENG", "ENG…
#> $ Season_End_Year  <int> 2023, 2023, 2023, 2023, 2023, 2023, 2023, 2023, 2023,…
#> $ Round            <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
#> $ Wk               <chr> "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "10…
#> $ Day              <chr> "Fri", "Sat", "Sat", "Sat", "Sat", "Sat", "Sat", "Sun…
#> $ Date             <date> 2022-08-05, 2022-08-06, 2022-08-06, 2022-08-06, 2022…
#> $ Time             <chr> "20:00", "12:30", "15:00", "15:00", "15:00", "15:00",…
#> $ Home             <chr> "Crystal Palace", "Fulham", "Tottenham", "Newcastle U…
#> $ HomeGoals        <dbl> 0, 2, 4, 2, 2, 2, 0, 2, 1, 0, 2, 5, 4, 3, 0, 3, 2, 3,…
#> $ Home_xG          <dbl> 1.2, 1.2, 1.5, 1.7, 0.8, 0.6, 0.7, 0.6, 1.4, 0.5, 1.1…
#> $ Away             <chr> "Arsenal", "Liverpool", "Southampton", "Nott'ham Fore…
#> $ AwayGoals        <dbl> 2, 2, 1, 0, 1, 0, 1, 2, 2, 2, 1, 1, 0, 0, 1, 1, 1, 2,…
#> $ Away_xG          <dbl> 1.0, 1.2, 0.5, 0.3, 1.3, 0.7, 1.5, 0.8, 1.5, 2.2, 1.0…
#> $ Attendance       <dbl> 25286, 22207, 61732, 52245, 36347, 11013, 39254, 3179…
#> $ Venue            <chr> "Selhurst Park", "Craven Cottage", "Tottenham Hotspur…
#> $ Referee          <chr> "Anthony Taylor", "Andy Madley", "Andre Marriner", "S…
#> $ Notes            <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
#> $ MatchURL         <chr> "https://fbref.com/en/matches/e62f6e78/Crystal-Palace…

Still having NAs returned for goals see screenshot below

image
tonyelhabr commented 8 months ago

i'm guessing this has something to do with character encodings being different on different systems. in this case, the "em-dash" can be probelemmatic here.

can you run this code and let me know what the output looks like? i've printed mine.

example_score <- '2–1'
## current approach
iconv(example_score, 'utf-8', 'ascii', sub=' ')
#> [1] "2   1"
## potential alternative approach
gsub('–', ' ', example_score)
#> [1] "2 1"
JaseZiv commented 8 months ago

i'm guessing this has something to do with character encodings being different on different systems. in this case, the "em-dash" can be probelemmatic here.

can you run this code and let me know what the output looks like? i've printed mine.

example_score <- '2–1'
## current approach
iconv(example_score, 'utf-8', 'ascii', sub=' ')
#> [1] "2   1"
## potential alternative approach
gsub('–', ' ', example_score)
#> [1] "2 1"

When initially writing that function, I did have reservations about handling it by using iconv(example_score, 'utf-8', 'ascii', sub=' ') but found that by explicitly including "em-dash" in the .R file was causing my RStudio session to keep borking...

dorronsoro1 commented 8 months ago

Below is the output when I run it:

> example_score <- '2–1'
> iconv(example_score, 'utf-8', 'ascii', sub=' ')
[1] "2-1"
> gsub('–', ' ', example_score)
[1] "2 1"
tonyelhabr commented 8 months ago

i'm guessing this has something to do with character encodings being different on different systems. in this case, the "em-dash" can be probelemmatic here. can you run this code and let me know what the output looks like? i've printed mine.

example_score <- '2–1'
## current approach
iconv(example_score, 'utf-8', 'ascii', sub=' ')
#> [1] "2   1"
## potential alternative approach
gsub('–', ' ', example_score)
#> [1] "2 1"

When initially writing that function, I did have reservations about handling it by using iconv(example_score, 'utf-8', 'ascii', sub=' ') but found that by explicitly including "em-dash" in the .R file was causing my RStudio session to keep borking...

I have experience this issue once. I realized my default encoding in RStudio was not UTF-8, so I fixed that.

image

tonyelhabr commented 8 months ago

Below is the output when I run it:

> example_score <- '2–1'
> iconv(example_score, 'utf-8', 'ascii', sub=' ')
[1] "2-1"
> gsub('–', ' ', example_score)
[1] "2 1"

Ah, so it seems the gsub() solution might be worth using going forward. As one last check, @dorronsoro1 I'm curious to know what you see when you run this.

Sys.getenv("LC_COLLATE")
#> [1] ""
dorronsoro1 commented 8 months ago

This is what I get when I run that command:

> Sys.getenv("LC_COLLATE")
[1] ""

I also tried changing the default encoding as mentioned above, saved a new script, and ran the lines previous mentioned, and still got NAs.

fine-lemur commented 8 months ago

Sys.getenv("LC_COLLATE") [1] “"

On my Mac also (running R in Intellij)

tonyelhabr commented 8 months ago

Ok so at this point I don't know exactly what the underlying issue is. My suspicion is that it has something to do with character encodings.

Anyways, the fix in #340 should resolve things. You can try it out for yourself right now (before the PR is merged) by installing the package with remotes::install_github("JaseZiv/worldfootballR@fix-na-goals".