Closed almartin82 closed 8 years ago
looking more closely at this, should try to generalize the fangraphs scrape to cover all 5 projection systems they display
I'll look tomorrow and get back to you.
On Sat, Mar 12, 2016 at 8:18 AM -0800, "Andrew Martin" notifications@github.com<mailto:notifications@github.com> wrote:
looking more closely at this, should try to generalize the fangraphs scrape to cover all 5 projection systems they display
Reply to this email directly or view it on GitHubhttps://github.com/almartin82/projprep/issues/19#issuecomment-195768850.
hey, @drewgriffith15 - didn't mean to spam you with notifications! just stumbled into your scripts and wanted to leave a breadcrumb for myself / cite my sources.
I love how compact that code is for reading in all of the steamer data by team. my current plan of attack was to turn that into a function, and then see what parameters needed to be changed to get all the other available projections - ZIPS, fangraph fans, etc.
if you'd like to join in, would love to have you on this project. in your 2015 script, were you implementing standings gain points? haven't ever tried that, and would definitely be interested in seeing how valuations differ for SGP vs z-score approaches.
OK, I wrote the core fangraphs scrape function - it's on a new branch, steamer
.
usage is pretty basic right now, and per #29, there are some variable-type issues that I need to figure out right away. but getting there!
Oh, a few days ago, I came across this: https://github.com/BillPetti/baseballr
If you haven't seen this before, it's worth a follow.
-Drew
From: Andrew Martin notifications@github.com Sent: Monday, March 14, 2016 2:03 AM To: almartin82/projprep Cc: Griffith, Warren Andrew (Analytics & Decision Support Admin) Subject: Re: [projprep] include regular steamer (#19)
OK, I wrote the core fangraphs scrape function - it's on a new branchhttps://github.com/almartin82/projprep/blob/72e7c722d9e3dcb8715607f0776d0153acf24040/R/fangraphs.R
usage is pretty basichttps://github.com/almartin82/projprep/blob/steamer/tests/testthat/test_fangraphs.R right now, and per #29https://github.com/almartin82/projprep/issues/29, there are some variable-type issues that I need to figure out right away. but getting there!
Reply to this email directly or view it on GitHubhttps://github.com/almartin82/projprep/issues/19#issuecomment-196159640.
Hey, feel free to use whatever code that you see on my side to build functions for Fangraphs (zips, steamer, etc.). If you want to work together on some code, I'd be up for it. Just let me know.
-Drew
From: Andrew Martin notifications@github.com Sent: Monday, March 14, 2016 2:03 AM To: almartin82/projprep Cc: Griffith, Warren Andrew (Analytics & Decision Support Admin) Subject: Re: [projprep] include regular steamer (#19)
OK, I wrote the core fangraphs scrape function - it's on a new branchhttps://github.com/almartin82/projprep/blob/72e7c722d9e3dcb8715607f0776d0153acf24040/R/fangraphs.R
usage is pretty basichttps://github.com/almartin82/projprep/blob/steamer/tests/testthat/test_fangraphs.R right now, and per #29https://github.com/almartin82/projprep/issues/29, there are some variable-type issues that I need to figure out right away. but getting there!
Reply to this email directly or view it on GitHubhttps://github.com/almartin82/projprep/issues/19#issuecomment-196159640.
@drewgriffith15 got all the fangraphs projections in! here's a minimal snippet:
library(devtools)
devtools::install_github('almartin82/projprep')
library(projprep)
ex <- projprep::get_steamer(2016, TRUE)
pp <- projprep::proj_prep(ex)
pp$h_final %>% dplyr::arrange(desc(value)) %>% peek()
will produce
mlbid fullname firstname lastname position priority_pos projection_name
1 545361 Mike Trout Mike Trout OF OF steamer
2 514888 Jose Altuve Jose Altuve 2B 2B steamer
3 502671 Paul Goldschmidt Paul Goldschmidt 1B 1B steamer
4 519203 Anthony Rizzo Anthony Rizzo 1B 1B steamer
5 457763 Buster Posey Buster Posey C C steamer
6 621043 Carlos Correa Carlos Correa SS SS steamer
ab r rbi sb tb obp r_zscore rbi_zscore sb_zscore tb_zscore obp_zscore
1 541 103 104 15 316 0.41 2.85 2.6077 0.5467 2.75 2.27
2 629 92 64 37 271 0.35 2.01 -0.0095 2.9794 1.53 2.20
3 538 92 92 14 286 0.40 2.01 1.8226 0.4362 1.94 2.02
4 552 92 99 10 287 0.37 2.01 2.2806 -0.0061 1.96 1.64
5 497 68 73 2 234 0.37 0.16 0.5794 -0.8908 0.53 0.77
6 572 80 83 20 262 0.34 1.08 1.2337 1.0996 1.29 1.12
unadjusted_zsum replacement_pos adjustment_zscore final_zsum value hit_pitch
1 11.0 OF -1.0 10.0 60 h
2 8.7 2B 1.2 9.9 59 h
3 8.2 1B 1.3 9.5 57 h
4 7.9 1B 1.3 9.2 55 h
5 1.2 C 7.0 8.1 49 h
6 5.8 SS 2.2 8.0 48 h
if you take a look at the main fangraphs scrape, you'll notice that instead of scraping pos=all
x 30 teams, I'm doing 6 hitting positions x 30 teams. that was the only way I could figure out how to get the position eligibility field - it isn't in the standard projections, so I had to extract it from the url.
I'm pretty happy with this - the only thing I wish that the scrape was preserving was the hyperlink for each player, which contains their fangraphs id. Having all those names and fangraphs ids would be nice to write back to the player metadata, and would help solve some of the problems around players with duplicate names (issue #31). @drewgriffith15 if you have any thoughts about how ways to extract those links as part of the scrape - the current readHTMLTable
call loses those hyperlinks. maybe we could combine it with something else from rvest
that makes getting those links easier?
I got the package to load from github. I was running a couple versions back and when I installed the new version, it worked!
Drew Griffith Business Data Analyst III Analytics and Decision Support
(850) 259-6039 (cell) [http://www.liberty.edu/media/1616/40themail/wordmark-for-email.jpg]
Liberty University | Training Champions for Christ since 1971
From: Andrew Martin notifications@github.com Sent: Tuesday, March 15, 2016 2:11 AM To: almartin82/projprep Cc: Griffith, Warren Andrew (Analytics & Decision Support Admin) Subject: Re: [projprep] include regular steamer (#19)
@drewgriffith15https://github.com/drewgriffith15 got all the fangraphs projections in! here's a minimal snippet:
library(devtools) devtools::install_github('almartin82/projprep') library(projprep)
ex <- projprep::get_steamer(2016, TRUE) pp <- projprep::proj_prep(ex) pp$h_final %>% dplyr::arrange(desc(value)) %>% peek()
will produce
mlbid fullname firstname lastname position priority_pos projection_name 1 545361 Mike Trout Mike Trout OF OF steamer 2 514888 Jose Altuve Jose Altuve 2B 2B steamer 3 502671 Paul Goldschmidt Paul Goldschmidt 1B 1B steamer 4 519203 Anthony Rizzo Anthony Rizzo 1B 1B steamer 5 457763 Buster Posey Buster Posey C C steamer 6 621043 Carlos Correa Carlos Correa SS SS steamer ab r rbi sb tb obp r_zscore rbi_zscore sb_zscore tb_zscore obp_zscore 1 541 103 104 15 316 0.41 2.85 2.6077 0.5467 2.75 2.27 2 629 92 64 37 271 0.35 2.01 -0.0095 2.9794 1.53 2.20 3 538 92 92 14 286 0.40 2.01 1.8226 0.4362 1.94 2.02 4 552 92 99 10 287 0.37 2.01 2.2806 -0.0061 1.96 1.64 5 497 68 73 2 234 0.37 0.16 0.5794 -0.8908 0.53 0.77 6 572 80 83 20 262 0.34 1.08 1.2337 1.0996 1.29 1.12 unadjusted_zsum replacement_pos adjustment_zscore final_zsum value hit_pitch 1 11.0 OF -1.0 10.0 60 h 2 8.7 2B 1.2 9.9 59 h 3 8.2 1B 1.3 9.5 57 h 4 7.9 1B 1.3 9.2 55 h 5 1.2 C 7.0 8.1 49 h 6 5.8 SS 2.2 8.0 48 h
if you take a look at the mainhttps://github.com/almartin82/projprep/blob/f2b6a366c0bdfa4e6e0b78a23db0c8fb60718eea/R/fangraphs.R#L23 fangraphs scrape, you'll notice that instead of scraping pos=all x 30 teams, I'm doing 6 hitting positions x 30 teams. that was the only way I could figure out how to get the position eligibility field - it isn't in the standard projections, so I had to extract it from the url.
I'm pretty happy with this - the only thing I wish that the scrape was preserving was the hyperlink for each player, which contains their fangraphs id. Having all those names and fangraphs ids would be nice to write back to the player metadatahttps://github.com/almartin82/projprep/blob/master/vignettes/universal_metadata.Rmd, and would help solve some of the problems around players with duplicate names (issue #31https://github.com/almartin82/projprep/issues/31). @drewgriffith15https://github.com/drewgriffith15 if you have any thoughts about how ways to extract those links as part of the scrape - the current readHTMLTable callhttps://github.com/almartin82/projprep/blob/f2b6a366c0bdfa4e6e0b %2078a23db0 %20c8fb60718eea/R/fangraphs.R#L30 loses those hyperlinks. maybe we could combine it with something else from rvest that makes getting those links easier?
You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: https://github.com/almartin82/projprep/issues/19#issuecomment-196680365
There is a dependency on the ensurer package when you run that snippet of code you sent me.
-Drew
From: Andrew Martin notifications@github.com Sent: Tuesday, March 15, 2016 2:11 AM To: almartin82/projprep Cc: Griffith, Warren Andrew (Analytics & Decision Support Admin) Subject: Re: [projprep] include regular steamer (#19)
@drewgriffith15https://github.com/drewgriffith15 got all the fangraphs projections in! here's a minimal snippet:
library(devtools) devtools::install_github('almartin82/projprep') library(projprep)
ex <- projprep::get_steamer(2016, TRUE) pp <- projprep::proj_prep(ex) pp$h_final %>% dplyr::arrange(desc(value)) %>% peek()
will produce
mlbid fullname firstname lastname position priority_pos projection_name 1 545361 Mike Trout Mike Trout OF OF steamer 2 514888 Jose Altuve Jose Altuve 2B 2B steamer 3 502671 Paul Goldschmidt Paul Goldschmidt 1B 1B steamer 4 519203 Anthony Rizzo Anthony Rizzo 1B 1B steamer 5 457763 Buster Posey Buster Posey C C steamer 6 621043 Carlos Correa Carlos Correa SS SS steamer ab r rbi sb tb obp r_zscore rbi_zscore sb_zscore tb_zscore obp_zscore 1 541 103 104 15 316 0.41 2.85 2.6077 0.5467 2.75 2.27 2 629 92 64 37 271 0.35 2.01 -0.0095 2.9794 1.53 2.20 3 538 92 92 14 286 0.40 2.01 1.8226 0.4362 1.94 2.02 4 552 92 99 10 287 0.37 2.01 2.2806 -0.0061 1.96 1.64 5 497 68 73 2 234 0.37 0.16 0.5794 -0.8908 0.53 0.77 6 572 80 83 20 262 0.34 1.08 1.2337 1.0996 1.29 1.12 unadjusted_zsum replacement_pos adjustment_zscore final_zsum value hit_pitch 1 11.0 OF -1.0 10.0 60 h 2 8.7 2B 1.2 9.9 59 h 3 8.2 1B 1.3 9.5 57 h 4 7.9 1B 1.3 9.2 55 h 5 1.2 C 7.0 8.1 49 h 6 5.8 SS 2.2 8.0 48 h
if you take a look at the mainhttps://github.com/almartin82/projprep/blob/f2b6a366c0bdfa4e6e0b78a23db0c8fb60718eea/R/fangraphs.R#L23 fangraphs scrape, you'll notice that instead of scraping pos=all x 30 teams, I'm doing 6 hitting positions x 30 teams. that was the only way I could figure out how to get the position eligibility field - it isn't in the standard projections, so I had to extract it from the url.
I'm pretty happy with this - the only thing I wish that the scrape was preserving was the hyperlink for each player, which contains their fangraphs id. Having all those names and fangraphs ids would be nice to write back to the player metadatahttps://github.com/almartin82/projprep/blob/master/vignettes/universal_metadata.Rmd, and would help solve some of the problems around players with duplicate names (issue #31https://github.com/almartin82/projprep/issues/31). @drewgriffith15https://github.com/drewgriffith15 if you have any thoughts about how ways to extract those links as part of the scrape - the current readHTMLTable callhttps://github.com/almartin82/projprep/blob/f2b6a366c0bdfa4e6e0b %2078a23db0 %20c8fb60718eea/R/fangraphs.R#L30 loses those hyperlinks. maybe we could combine it with something else from rvest that makes getting those links easier?
You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: https://github.com/almartin82/projprep/issues/19#issuecomment-196680365
ah, ok -- I have ensurer in suggests
but maybe I will move all of those
to imports
so that they install cleanly.
http://r-pkgs.had.co.nz/description.html
On Tue, Mar 15, 2016 at 11:00 AM, Drew Griffith notifications@github.com wrote:
There is a dependency on the ensurer package when you run that snippet of code you sent me.
-Drew
From: Andrew Martin notifications@github.com Sent: Tuesday, March 15, 2016 2:11 AM To: almartin82/projprep Cc: Griffith, Warren Andrew (Analytics & Decision Support Admin) Subject: Re: [projprep] include regular steamer (#19)
@drewgriffith15https://github.com/drewgriffith15 got all the fangraphs projections in! here's a minimal snippet:
library(devtools) devtools::install_github('almartin82/projprep') library(projprep)
ex <- projprep::get_steamer(2016, TRUE) pp <- projprep::proj_prep(ex) pp$h_final %>% dplyr::arrange(desc(value)) %>% peek()
will produce
mlbid fullname firstname lastname position priority_pos projection_name 1 545361 Mike Trout Mike Trout OF OF steamer 2 514888 Jose Altuve Jose Altuve 2B 2B steamer 3 502671 Paul Goldschmidt Paul Goldschmidt 1B 1B steamer 4 519203 Anthony Rizzo Anthony Rizzo 1B 1B steamer 5 457763 Buster Posey Buster Posey C C steamer 6 621043 Carlos Correa Carlos Correa SS SS steamer ab r rbi sb tb obp r_zscore rbi_zscore sb_zscore tb_zscore obp_zscore 1 541 103 104 15 316 0.41 2.85 2.6077 0.5467 2.75 2.27 2 629 92 64 37 271 0.35 2.01 -0.0095 2.9794 1.53 2.20 3 538 92 92 14 286 0.40 2.01 1.8226 0.4362 1.94 2.02 4 552 92 99 10 287 0.37 2.01 2.2806 -0.0061 1.96 1.64 5 497 68 73 2 234 0.37 0.16 0.5794 -0.8908 0.53 0.77 6 572 80 83 20 262 0.34 1.08 1.2337 1.0996 1.29 1.12 unadjusted_zsum replacement_pos adjustment_zscore final_zsum value hit_pitch 1 11.0 OF -1.0 10.0 60 h 2 8.7 2B 1.2 9.9 59 h 3 8.2 1B 1.3 9.5 57 h 4 7.9 1B 1.3 9.2 55 h 5 1.2 C 7.0 8.1 49 h 6 5.8 SS 2.2 8.0 48 h
if you take a look at the main< https://github.com/almartin82/projprep/blob/f2b6a366c0bdfa4e6e0b78a23db0c8fb60718eea/R/fangraphs.R#L23> fangraphs scrape, you'll notice that instead of scraping pos=all x 30 teams, I'm doing 6 hitting positions x 30 teams. that was the only way I could figure out how to get the position eligibility field - it isn't in the standard projections, so I had to extract it from the url.
I'm pretty happy with this - the only thing I wish that the scrape was preserving was the hyperlink for each player, which contains their fangraphs id. Having all those names and fangraphs ids would be nice to write back to the player metadata< https://github.com/almartin82/projprep/blob/master/vignettes/universal_metadata.Rmd>, and would help solve some of the problems around players with duplicate names (issue #31https://github.com/almartin82/projprep/issues/31). @drewgriffith15https://github.com/drewgriffith15 if you have any thoughts about how ways to extract those links as part of the scrape - the current readHTMLTable call< https://github.com/almartin82/projprep/blob/f2b6a366c0bdfa4e6e0b %2078a23db0 %20c8fb60718eea/R/fangraphs.R#L30> loses those hyperlinks. maybe we could combine it with something else from rvest that makes getting those links easier?
You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: https://github.com/almartin82/projprep/issues/19#issuecomment-196680365
— You are receiving this because you modified the open/close state. Reply to this email directly or view it on GitHub: https://github.com/almartin82/projprep/issues/19#issuecomment-196860906
@drewgriffith15 just pushed that fix to master so that you can re-install. if you want to contribute (please do) I would also suggest cloning the repo locally on your machine, which makes it easier to pull down these changes and use them locally.
I try to use the branch workflow to manage these kinds of distributed projects.
Cool. I've got notifications setup, so I saw it. I will do my best to contribute and to use the branches. Never worked on a collaborative project on Github, so hopefully I can get it right the first time. I already noticed something else. Found an error with this:
ex <- projprep::get_fangraphs(2016, TRUE) Error in names(df)[2] <- "fg_note" : 'names' attribute [2] must be the same length as the vector [1]
From: Andrew Martin notifications@github.com Sent: Tuesday, March 15, 2016 11:16 AM To: almartin82/projprep Cc: Griffith, Warren Andrew (Analytics & Decision Support Admin) Subject: Re: [projprep] include regular steamer (#19)
@drewgriffith15https://github.com/drewgriffith15 just pushed that fix to master so that you can re-install. if you want to contribute (please do) I would also suggest cloning the repo locally on your machine, which makes it easier to pull down these changes and use them locally.
I try to use the branch workflowhttps://git-scm.com/book/en/v2/Git-Branching-Branching-Workflows to manage these kinds of distributed projects.
You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: https://github.com/almartin82/projprep/issues/19#issuecomment-196870872
Awesome -- happy to help w/ git collaboration. This set of Atlassian writeups is awesome - super intuitive intro to different models of git collaboration. 'Feature branch workflow' is good bang for the buck - not nearly as complicated as the gitflow workflow, but you get like 80% of the benefits. The good thing about this is if you fork projprep and clone it locally, you'll be committing to your own branch of the project - there's literally no way that it could break anything on this branch. Once you have something you want to contribute back, you just create a pull request, and I can bring those changes back into the code base.
This is how I work even when I am the only contributor to a project -- if you look at the network of commits, you'll see a feature branch leave master
, a bunch of commits happen, and then a pull request merges the changes back into master.
That has the advantage of keeping master stable - the bleeding edge changes live on a branch until they are ready for production. On projects where I have another lead collaborator (like mapvizieR with @chrishaid), we have established the convention of always submitting our branches to each other for code review and approval. I really like this workflow.
@drewgriffith15 has some code that does this for 2015: https://github.com/drewgriffith15/MLB/blob/master/xDraft.R