BillPetti / baseballr

A package written for R focused on baseball analysis. Currently in development.
billpetti.github.io/baseballr
Other
365 stars 99 forks source link

Error: batter_game_logs_fg() failing on unexpected data #129

Closed rdelrossi closed 4 years ago

rdelrossi commented 4 years ago

Appears that batter_game_logs_fg() is trying to read from the wrong table of information from a player's page on fangraphs.com. Probably a result of a change to the fangraphs' page structure.

Tracing through the code, it looks to me as though the code assumes the requested data is in the last table on the page, which seems logical enough:

payload <- xml2::read_html(url) %>% rvest::html_nodes("table") %>% .[length(.)] %>% rvest::html_table() %>% as.data.frame()

But the data returned to the function is actually the brief biographical information at the top of the page, which, of course, doesn't pass the parsing code that follows in batter_game_logs_fg().

(Apologies if this is the wrong place to post this. I'm new to Github issues.)

BillPetti commented 4 years ago

Yeah, they just changed a bunch of stuff--I'll need to get into it and refactor some of the code

rdelrossi commented 4 years ago

Thanks for acknowledging the report—and thank you for baseballr, generally!

I'll be interested to learn how you deal with this. From what I can see, fangraphs generates the game log dynamically, so it's not in the HTML in order to parse the data out.

Baseball Savant, on the other hand, does include the same data in the statically-generated HMTL. So, following your pattern, I was able to tease out the game log for a player like this:

url <- "https://baseballsavant.mlb.com/savant-player/mookie-betts-605141?stats=gamelogs-r-hitting-mlb&season=2019"
payload <- xml2::read_html(url) %>% 
  rvest::html_nodes("table") %>% 
  .[40] %>%
  rvest::html_table() %>%
  as.data.frame()

Of course, I'm "cheating" by sleuthing out that the table I want is the 40th of the payload for this particular result.

Anyhow, thanks again for all your work on baseballr.

colincharles commented 4 years ago

Ran into this issue a couple weeks ago.

I think you need to point the function to the legacy pages. Here's a link to the article that they posted after the update: https://blogs.fangraphs.com/instagraphs/our-player-pages-are-going-to-change/

Hopefully that helps

BillPetti commented 4 years ago

Thanks. I spoke to them before they updated, just haven’t had the time to update. Hopefully soon.

BillPetti commented 4 years ago

Fixed by this commit

rdelrossi commented 4 years ago

Very cool, thanks for the fix, @BillPetti. Appreciate the legacy page info, too, @colincharles.

-- Robert