almartin82 / projprep

a R package that helps read, clean up, and convert baseball projection data into auction prices.
7 stars 1 forks source link

include regular steamer #19

Closed almartin82 closed 8 years ago

almartin82 commented 8 years ago

@drewgriffith15 has some code that does this for 2015: https://github.com/drewgriffith15/MLB/blob/master/xDraft.R

almartin82 commented 8 years ago

looking more closely at this, should try to generalize the fangraphs scrape to cover all 5 projection systems they display

drewgriffith15 commented 8 years ago

I'll look tomorrow and get back to you.

On Sat, Mar 12, 2016 at 8:18 AM -0800, "Andrew Martin" notifications@github.com<mailto:notifications@github.com> wrote:

looking more closely at this, should try to generalize the fangraphs scrape to cover all 5 projection systems they display

Reply to this email directly or view it on GitHubhttps://github.com/almartin82/projprep/issues/19#issuecomment-195768850.

almartin82 commented 8 years ago

hey, @drewgriffith15 - didn't mean to spam you with notifications! just stumbled into your scripts and wanted to leave a breadcrumb for myself / cite my sources.

I love how compact that code is for reading in all of the steamer data by team. my current plan of attack was to turn that into a function, and then see what parameters needed to be changed to get all the other available projections - ZIPS, fangraph fans, etc.

if you'd like to join in, would love to have you on this project. in your 2015 script, were you implementing standings gain points? haven't ever tried that, and would definitely be interested in seeing how valuations differ for SGP vs z-score approaches.

almartin82 commented 8 years ago

OK, I wrote the core fangraphs scrape function - it's on a new branch, steamer.

usage is pretty basic right now, and per #29, there are some variable-type issues that I need to figure out right away. but getting there!

drewgriffith15 commented 8 years ago

Oh, a few days ago, I came across this: https://github.com/BillPetti/baseballr

If you haven't seen this before, it's worth a follow.

-Drew


From: Andrew Martin notifications@github.com Sent: Monday, March 14, 2016 2:03 AM To: almartin82/projprep Cc: Griffith, Warren Andrew (Analytics & Decision Support Admin) Subject: Re: [projprep] include regular steamer (#19)

OK, I wrote the core fangraphs scrape function - it's on a new branchhttps://github.com/almartin82/projprep/blob/72e7c722d9e3dcb8715607f0776d0153acf24040/R/fangraphs.R

usage is pretty basichttps://github.com/almartin82/projprep/blob/steamer/tests/testthat/test_fangraphs.R right now, and per #29https://github.com/almartin82/projprep/issues/29, there are some variable-type issues that I need to figure out right away. but getting there!

Reply to this email directly or view it on GitHubhttps://github.com/almartin82/projprep/issues/19#issuecomment-196159640.

drewgriffith15 commented 8 years ago

Hey, feel free to use whatever code that you see on my side to build functions for Fangraphs (zips, steamer, etc.). If you want to work together on some code, I'd be up for it. Just let me know.

-Drew


From: Andrew Martin notifications@github.com Sent: Monday, March 14, 2016 2:03 AM To: almartin82/projprep Cc: Griffith, Warren Andrew (Analytics & Decision Support Admin) Subject: Re: [projprep] include regular steamer (#19)

OK, I wrote the core fangraphs scrape function - it's on a new branchhttps://github.com/almartin82/projprep/blob/72e7c722d9e3dcb8715607f0776d0153acf24040/R/fangraphs.R

usage is pretty basichttps://github.com/almartin82/projprep/blob/steamer/tests/testthat/test_fangraphs.R right now, and per #29https://github.com/almartin82/projprep/issues/29, there are some variable-type issues that I need to figure out right away. but getting there!

Reply to this email directly or view it on GitHubhttps://github.com/almartin82/projprep/issues/19#issuecomment-196159640.

almartin82 commented 8 years ago

@drewgriffith15 got all the fangraphs projections in! here's a minimal snippet:

  library(devtools)
  devtools::install_github('almartin82/projprep')
  library(projprep)

  ex <- projprep::get_steamer(2016, TRUE)
  pp <- projprep::proj_prep(ex)
  pp$h_final %>% dplyr::arrange(desc(value)) %>% peek()

will produce

   mlbid         fullname firstname    lastname position priority_pos projection_name
1 545361       Mike Trout      Mike       Trout       OF           OF         steamer
2 514888      Jose Altuve      Jose      Altuve       2B           2B         steamer
3 502671 Paul Goldschmidt      Paul Goldschmidt       1B           1B         steamer
4 519203    Anthony Rizzo   Anthony       Rizzo       1B           1B         steamer
5 457763     Buster Posey    Buster       Posey        C            C         steamer
6 621043    Carlos Correa    Carlos      Correa       SS           SS         steamer
   ab   r rbi sb  tb  obp r_zscore rbi_zscore sb_zscore tb_zscore obp_zscore
1 541 103 104 15 316 0.41     2.85     2.6077    0.5467      2.75       2.27
2 629  92  64 37 271 0.35     2.01    -0.0095    2.9794      1.53       2.20
3 538  92  92 14 286 0.40     2.01     1.8226    0.4362      1.94       2.02
4 552  92  99 10 287 0.37     2.01     2.2806   -0.0061      1.96       1.64
5 497  68  73  2 234 0.37     0.16     0.5794   -0.8908      0.53       0.77
6 572  80  83 20 262 0.34     1.08     1.2337    1.0996      1.29       1.12
  unadjusted_zsum replacement_pos adjustment_zscore final_zsum value hit_pitch
1            11.0              OF              -1.0       10.0    60         h
2             8.7              2B               1.2        9.9    59         h
3             8.2              1B               1.3        9.5    57         h
4             7.9              1B               1.3        9.2    55         h
5             1.2               C               7.0        8.1    49         h
6             5.8              SS               2.2        8.0    48         h

if you take a look at the main fangraphs scrape, you'll notice that instead of scraping pos=all x 30 teams, I'm doing 6 hitting positions x 30 teams. that was the only way I could figure out how to get the position eligibility field - it isn't in the standard projections, so I had to extract it from the url.

I'm pretty happy with this - the only thing I wish that the scrape was preserving was the hyperlink for each player, which contains their fangraphs id. Having all those names and fangraphs ids would be nice to write back to the player metadata, and would help solve some of the problems around players with duplicate names (issue #31). @drewgriffith15 if you have any thoughts about how ways to extract those links as part of the scrape - the current readHTMLTable call loses those hyperlinks. maybe we could combine it with something else from rvest that makes getting those links easier?

drewgriffith15 commented 8 years ago

I got the package to load from github. I was running a couple versions back and when I installed the new version, it worked!

Drew Griffith Business Data Analyst III Analytics and Decision Support

(850) 259-6039 (cell) [http://www.liberty.edu/media/1616/40themail/wordmark-for-email.jpg]

Liberty University | Training Champions for Christ since 1971


From: Andrew Martin notifications@github.com Sent: Tuesday, March 15, 2016 2:11 AM To: almartin82/projprep Cc: Griffith, Warren Andrew (Analytics & Decision Support Admin) Subject: Re: [projprep] include regular steamer (#19)

@drewgriffith15https://github.com/drewgriffith15 got all the fangraphs projections in! here's a minimal snippet:

library(devtools) devtools::install_github('almartin82/projprep') library(projprep)

ex <- projprep::get_steamer(2016, TRUE) pp <- projprep::proj_prep(ex) pp$h_final %>% dplyr::arrange(desc(value)) %>% peek()

will produce

mlbid fullname firstname lastname position priority_pos projection_name 1 545361 Mike Trout Mike Trout OF OF steamer 2 514888 Jose Altuve Jose Altuve 2B 2B steamer 3 502671 Paul Goldschmidt Paul Goldschmidt 1B 1B steamer 4 519203 Anthony Rizzo Anthony Rizzo 1B 1B steamer 5 457763 Buster Posey Buster Posey C C steamer 6 621043 Carlos Correa Carlos Correa SS SS steamer ab r rbi sb tb obp r_zscore rbi_zscore sb_zscore tb_zscore obp_zscore 1 541 103 104 15 316 0.41 2.85 2.6077 0.5467 2.75 2.27 2 629 92 64 37 271 0.35 2.01 -0.0095 2.9794 1.53 2.20 3 538 92 92 14 286 0.40 2.01 1.8226 0.4362 1.94 2.02 4 552 92 99 10 287 0.37 2.01 2.2806 -0.0061 1.96 1.64 5 497 68 73 2 234 0.37 0.16 0.5794 -0.8908 0.53 0.77 6 572 80 83 20 262 0.34 1.08 1.2337 1.0996 1.29 1.12 unadjusted_zsum replacement_pos adjustment_zscore final_zsum value hit_pitch 1 11.0 OF -1.0 10.0 60 h 2 8.7 2B 1.2 9.9 59 h 3 8.2 1B 1.3 9.5 57 h 4 7.9 1B 1.3 9.2 55 h 5 1.2 C 7.0 8.1 49 h 6 5.8 SS 2.2 8.0 48 h

if you take a look at the mainhttps://github.com/almartin82/projprep/blob/f2b6a366c0bdfa4e6e0b78a23db0c8fb60718eea/R/fangraphs.R#L23 fangraphs scrape, you'll notice that instead of scraping pos=all x 30 teams, I'm doing 6 hitting positions x 30 teams. that was the only way I could figure out how to get the position eligibility field - it isn't in the standard projections, so I had to extract it from the url.

I'm pretty happy with this - the only thing I wish that the scrape was preserving was the hyperlink for each player, which contains their fangraphs id. Having all those names and fangraphs ids would be nice to write back to the player metadatahttps://github.com/almartin82/projprep/blob/master/vignettes/universal_metadata.Rmd, and would help solve some of the problems around players with duplicate names (issue #31https://github.com/almartin82/projprep/issues/31). @drewgriffith15https://github.com/drewgriffith15 if you have any thoughts about how ways to extract those links as part of the scrape - the current readHTMLTable callhttps://github.com/almartin82/projprep/blob/f2b6a366c0bdfa4e6e0b %2078a23db0 %20c8fb60718eea/R/fangraphs.R#L30 loses those hyperlinks. maybe we could combine it with something else from rvest that makes getting those links easier?

You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: https://github.com/almartin82/projprep/issues/19#issuecomment-196680365

drewgriffith15 commented 8 years ago

There is a dependency on the ensurer package when you run that snippet of code you sent me.

-Drew


From: Andrew Martin notifications@github.com Sent: Tuesday, March 15, 2016 2:11 AM To: almartin82/projprep Cc: Griffith, Warren Andrew (Analytics & Decision Support Admin) Subject: Re: [projprep] include regular steamer (#19)

@drewgriffith15https://github.com/drewgriffith15 got all the fangraphs projections in! here's a minimal snippet:

library(devtools) devtools::install_github('almartin82/projprep') library(projprep)

ex <- projprep::get_steamer(2016, TRUE) pp <- projprep::proj_prep(ex) pp$h_final %>% dplyr::arrange(desc(value)) %>% peek()

will produce

mlbid fullname firstname lastname position priority_pos projection_name 1 545361 Mike Trout Mike Trout OF OF steamer 2 514888 Jose Altuve Jose Altuve 2B 2B steamer 3 502671 Paul Goldschmidt Paul Goldschmidt 1B 1B steamer 4 519203 Anthony Rizzo Anthony Rizzo 1B 1B steamer 5 457763 Buster Posey Buster Posey C C steamer 6 621043 Carlos Correa Carlos Correa SS SS steamer ab r rbi sb tb obp r_zscore rbi_zscore sb_zscore tb_zscore obp_zscore 1 541 103 104 15 316 0.41 2.85 2.6077 0.5467 2.75 2.27 2 629 92 64 37 271 0.35 2.01 -0.0095 2.9794 1.53 2.20 3 538 92 92 14 286 0.40 2.01 1.8226 0.4362 1.94 2.02 4 552 92 99 10 287 0.37 2.01 2.2806 -0.0061 1.96 1.64 5 497 68 73 2 234 0.37 0.16 0.5794 -0.8908 0.53 0.77 6 572 80 83 20 262 0.34 1.08 1.2337 1.0996 1.29 1.12 unadjusted_zsum replacement_pos adjustment_zscore final_zsum value hit_pitch 1 11.0 OF -1.0 10.0 60 h 2 8.7 2B 1.2 9.9 59 h 3 8.2 1B 1.3 9.5 57 h 4 7.9 1B 1.3 9.2 55 h 5 1.2 C 7.0 8.1 49 h 6 5.8 SS 2.2 8.0 48 h

if you take a look at the mainhttps://github.com/almartin82/projprep/blob/f2b6a366c0bdfa4e6e0b78a23db0c8fb60718eea/R/fangraphs.R#L23 fangraphs scrape, you'll notice that instead of scraping pos=all x 30 teams, I'm doing 6 hitting positions x 30 teams. that was the only way I could figure out how to get the position eligibility field - it isn't in the standard projections, so I had to extract it from the url.

I'm pretty happy with this - the only thing I wish that the scrape was preserving was the hyperlink for each player, which contains their fangraphs id. Having all those names and fangraphs ids would be nice to write back to the player metadatahttps://github.com/almartin82/projprep/blob/master/vignettes/universal_metadata.Rmd, and would help solve some of the problems around players with duplicate names (issue #31https://github.com/almartin82/projprep/issues/31). @drewgriffith15https://github.com/drewgriffith15 if you have any thoughts about how ways to extract those links as part of the scrape - the current readHTMLTable callhttps://github.com/almartin82/projprep/blob/f2b6a366c0bdfa4e6e0b %2078a23db0 %20c8fb60718eea/R/fangraphs.R#L30 loses those hyperlinks. maybe we could combine it with something else from rvest that makes getting those links easier?

You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: https://github.com/almartin82/projprep/issues/19#issuecomment-196680365

almartin82 commented 8 years ago

ah, ok -- I have ensurer in suggests but maybe I will move all of those to imports so that they install cleanly. http://r-pkgs.had.co.nz/description.html

On Tue, Mar 15, 2016 at 11:00 AM, Drew Griffith notifications@github.com wrote:

There is a dependency on the ensurer package when you run that snippet of code you sent me.

-Drew


From: Andrew Martin notifications@github.com Sent: Tuesday, March 15, 2016 2:11 AM To: almartin82/projprep Cc: Griffith, Warren Andrew (Analytics & Decision Support Admin) Subject: Re: [projprep] include regular steamer (#19)

@drewgriffith15https://github.com/drewgriffith15 got all the fangraphs projections in! here's a minimal snippet:

library(devtools) devtools::install_github('almartin82/projprep') library(projprep)

ex <- projprep::get_steamer(2016, TRUE) pp <- projprep::proj_prep(ex) pp$h_final %>% dplyr::arrange(desc(value)) %>% peek()

will produce

mlbid fullname firstname lastname position priority_pos projection_name 1 545361 Mike Trout Mike Trout OF OF steamer 2 514888 Jose Altuve Jose Altuve 2B 2B steamer 3 502671 Paul Goldschmidt Paul Goldschmidt 1B 1B steamer 4 519203 Anthony Rizzo Anthony Rizzo 1B 1B steamer 5 457763 Buster Posey Buster Posey C C steamer 6 621043 Carlos Correa Carlos Correa SS SS steamer ab r rbi sb tb obp r_zscore rbi_zscore sb_zscore tb_zscore obp_zscore 1 541 103 104 15 316 0.41 2.85 2.6077 0.5467 2.75 2.27 2 629 92 64 37 271 0.35 2.01 -0.0095 2.9794 1.53 2.20 3 538 92 92 14 286 0.40 2.01 1.8226 0.4362 1.94 2.02 4 552 92 99 10 287 0.37 2.01 2.2806 -0.0061 1.96 1.64 5 497 68 73 2 234 0.37 0.16 0.5794 -0.8908 0.53 0.77 6 572 80 83 20 262 0.34 1.08 1.2337 1.0996 1.29 1.12 unadjusted_zsum replacement_pos adjustment_zscore final_zsum value hit_pitch 1 11.0 OF -1.0 10.0 60 h 2 8.7 2B 1.2 9.9 59 h 3 8.2 1B 1.3 9.5 57 h 4 7.9 1B 1.3 9.2 55 h 5 1.2 C 7.0 8.1 49 h 6 5.8 SS 2.2 8.0 48 h

if you take a look at the main< https://github.com/almartin82/projprep/blob/f2b6a366c0bdfa4e6e0b78a23db0c8fb60718eea/R/fangraphs.R#L23> fangraphs scrape, you'll notice that instead of scraping pos=all x 30 teams, I'm doing 6 hitting positions x 30 teams. that was the only way I could figure out how to get the position eligibility field - it isn't in the standard projections, so I had to extract it from the url.

I'm pretty happy with this - the only thing I wish that the scrape was preserving was the hyperlink for each player, which contains their fangraphs id. Having all those names and fangraphs ids would be nice to write back to the player metadata< https://github.com/almartin82/projprep/blob/master/vignettes/universal_metadata.Rmd>, and would help solve some of the problems around players with duplicate names (issue #31https://github.com/almartin82/projprep/issues/31). @drewgriffith15https://github.com/drewgriffith15 if you have any thoughts about how ways to extract those links as part of the scrape - the current readHTMLTable call< https://github.com/almartin82/projprep/blob/f2b6a366c0bdfa4e6e0b %2078a23db0 %20c8fb60718eea/R/fangraphs.R#L30> loses those hyperlinks. maybe we could combine it with something else from rvest that makes getting those links easier?

You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: https://github.com/almartin82/projprep/issues/19#issuecomment-196680365

— You are receiving this because you modified the open/close state. Reply to this email directly or view it on GitHub: https://github.com/almartin82/projprep/issues/19#issuecomment-196860906

almartin82 commented 8 years ago

@drewgriffith15 just pushed that fix to master so that you can re-install. if you want to contribute (please do) I would also suggest cloning the repo locally on your machine, which makes it easier to pull down these changes and use them locally.

I try to use the branch workflow to manage these kinds of distributed projects.

drewgriffith15 commented 8 years ago

Cool. I've got notifications setup, so I saw it. I will do my best to contribute and to use the branches. Never worked on a collaborative project on Github, so hopefully I can get it right the first time. I already noticed something else. Found an error with this:

ex <- projprep::get_fangraphs(2016, TRUE) Error in names(df)[2] <- "fg_note" : 'names' attribute [2] must be the same length as the vector [1]


From: Andrew Martin notifications@github.com Sent: Tuesday, March 15, 2016 11:16 AM To: almartin82/projprep Cc: Griffith, Warren Andrew (Analytics & Decision Support Admin) Subject: Re: [projprep] include regular steamer (#19)

@drewgriffith15https://github.com/drewgriffith15 just pushed that fix to master so that you can re-install. if you want to contribute (please do) I would also suggest cloning the repo locally on your machine, which makes it easier to pull down these changes and use them locally.

I try to use the branch workflowhttps://git-scm.com/book/en/v2/Git-Branching-Branching-Workflows to manage these kinds of distributed projects.

You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: https://github.com/almartin82/projprep/issues/19#issuecomment-196870872

almartin82 commented 8 years ago

Awesome -- happy to help w/ git collaboration. This set of Atlassian writeups is awesome - super intuitive intro to different models of git collaboration. 'Feature branch workflow' is good bang for the buck - not nearly as complicated as the gitflow workflow, but you get like 80% of the benefits. The good thing about this is if you fork projprep and clone it locally, you'll be committing to your own branch of the project - there's literally no way that it could break anything on this branch. Once you have something you want to contribute back, you just create a pull request, and I can bring those changes back into the code base.

This is how I work even when I am the only contributor to a project -- if you look at the network of commits, you'll see a feature branch leave master, a bunch of commits happen, and then a pull request merges the changes back into master.

That has the advantage of keeping master stable - the bleeding edge changes live on a branch until they are ready for production. On projects where I have another lead collaborator (like mapvizieR with @chrishaid), we have established the convention of always submitting our branches to each other for code review and approval. I really like this workflow.