almartin82 / projprep

a R package that helps read, clean up, and convert baseball projection data into auction prices.
7 stars 1 forks source link

Unable to pull Steamer #49

Closed jestarr closed 6 years ago

jestarr commented 6 years ago

I'm getting the following error message: "Error in names(df)[2] <- "fg_note" : 'names' attribute [2] must be the same length as the vector [1]"

almartin82 commented 6 years ago

steamer data has probably changed how it returns since I wrote these functions!

jestarr commented 6 years ago

Any plans to update this package for 2018?

almartin82 commented 6 years ago

I probably should!

jestarr commented 6 years ago

Anything :+1: ?

almartin82 commented 6 years ago

Seems reasonable- can you look for the 2018 URL and post it here?

On Wed, Mar 21, 2018, 8:49 AM jestarr notifications@github.com wrote:

Anything 👍 ?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/almartin82/projprep/issues/49#issuecomment-374925163, or mute the thread https://github.com/notifications/unsubscribe-auth/AAvvN-MuOK3RpxCe6i5sjeAUqoplnlTcks5tgkw-gaJpZM4QwndE .

almartin82 commented 6 years ago

okay, this was actually a fairly easy fix! looks like fangraphs (rightly) pushes all traffic to https, but the html parser I am using is pretty old, and only handles http traffic. the solution was an intermediate step where we read the content in using RCurl::getURL and then pass to XML::readHTMLTable.

Should probably move all of this to rvest / httr, which is the more current way of handling web content, but this seems to work for now.

almartin82 commented 6 years ago

@jestarr let me know if this solves steamer / fangraphs for you.

almartin82 commented 6 years ago

hmm now I get

Error in select_impl(.data, vars) : 
  found duplicated column name: , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , 
almartin82 commented 6 years ago

issue is in clean_raw_fangraphs...

jestarr commented 6 years ago

It's taking forever to pull down any data using get_steamer. Would rvest make the scrape quicker? I wish Fangraphs had an open source API.

almartin82 commented 6 years ago

It's pretty slow (5-10 min?) but it should resolve.

The scraping strategy was designed to be comprehensive but not fast. If memory serves it traverses every team x every position - so there are a lot of calls happening behind the scenes.

On Thu, Mar 29, 2018, 1:52 PM jestarr notifications@github.com wrote:

It's taking forever to pull down any data using get_steamer. Would rvest make the scrape quicker? I wish Fangraphs had an open source API.

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/almartin82/projprep/issues/49#issuecomment-377318477, or mute the thread https://github.com/notifications/unsubscribe-auth/AAvvN_hvVcfyVjge04Y9Fnktn54o1iQDks5tjR9fgaJpZM4QwndE .