droher / baseball.computer

Other
14 stars 1 forks source link

Create local database #15

Open kbreit opened 3 months ago

kbreit commented 3 months ago

I am interested in replicating this database locally but do not see documentation on how to do this. Please add information on how to create and annually update the database.

mdahlman commented 2 months ago

Agreed. I wrote a script that generated a CREATE TABLE like the following example. It would be really useful to include a script and the generated CREATE TABLE statements in the repo.

CREATE TABLE caddi_demo.baseball.metrics_player_season_league_offense (
    player_id varchar COMMENT 'Retrosheet 8-character person ID for a player, which consists of the first four characters of the last name, the first character of the first name, and then three digits to disambiguate between players with the same five characters.',
    season smallint COMMENT 'Year, 4-digit integer',
    league varchar COMMENT 'Abbreviation for the league associated with this entity. NULL indicates that a team is not part of a league (e.g. all-star games or Negro League barnstorming teams).',
    plate_appearances integer COMMENT '(PA) Number of times a batter came to the plate, including walks, hit by pitches, sacrifices, and at-bats.',
    at_bats integer COMMENT '(AB) Number of plate appearances that ended in either a hit or a non-sacrifice out.',
    hits integer COMMENT '(H) Number of times a batter reached base safely without an error or fielder''s choice.',
    singles integer COMMENT '(1B) Number of hits in which a batter reached first base.',
    doubles integer COMMENT '(2B) Number of hits in which a batter reached second base without the help of an error or an attempted play on another runner.',
    triples integer COMMENT '(3B) Number of hits in which a batter reached third base without the help of an error or an attempted play on another runner.',
    home_runs integer COMMENT '(HR) Number of hits in which a batter reached home plate without the help of an error or an attempted play on another runner (usually by hitting the ball out of the park).',
    total_bases integer COMMENT '(TB) Number of bases a batter reached safely without the help of an error or an attempted play on another runner. HR * 4 + 3B * 3 + 2B * 2 + (H - HR - 3B - 2B)',
    strikeouts integer COMMENT '(K, SO) Number of times a batter struck out. This includes plays in which a batter reached base on a dropped third strike.',
    walks integer COMMENT '(BB, occasionally W) Number of times a batter reaches base on called balls out of the strike zone (Four balls for all of MLB history after the 1880s).',
    intentional_walks integer COMMENT '(IBB, occasionally IW) Number of times a batter was intentionally walked. This number may be missing or undercounted in earlier years, as intentional walks were not officially tracked until 1955.',
    hit_by_pitches integer COMMENT '(HBP) Number of times a batter was awarded first base after being hit by a pitch.',
    sacrifice_hits integer COMMENT '(SH) Number of times a batter performed a sacrifice bunt to advance another runner. A bunt may count as a sacrifice even if the batter reaches base safely. Unsuccessful attempts are not counted, nor are non-sacrifice bunts.',
    sacrifice_flies integer COMMENT '(SF) Number of times a batter hit a fly ball that resulted in an out (or error), but allowed a runner to score on the throw.',
    reached_on_errors integer COMMENT '(ROE) Number of times a batter reached base safely due to an error.',
    reached_on_interferences integer COMMENT 'Number of times a batter was awarded first base for being illegally hindered by a fielder, usually the catcher.',
    inside_the_park_home_runs integer COMMENT 'Number of times a batter hit a home run without the ball leaving the field of play.',
    ground_rule_doubles integer COMMENT 'Number of times a batter was awarded a double on a ball that went out of play after bouncing in fair territory.',
    infield_hits integer COMMENT 'Number of times a batter reached base safely on a hit that did not reach the outfield.',
    on_base_opportunities integer COMMENT 'Number of plate appearances in which a batter either did or did not reach base, as defined by the formula for on-base percentage. Note that this is very similar to but different from plate_appearances, as it does not include sacrifice hits or interference.',
    on_base_successes integer COMMENT 'Number of hits, walks, and hit by pitches that serves as the numerator in the formula for on-base percentage.',
    runs_batted_in integer COMMENT '(RBI) Number of runs that scored as a result of a batter''s plate appearance. This is usually the number of runs that scored on the play, but errors and other similar cases may cause some runs not to be credited to the batter. Scorerkeeper discretion occasionally causes differences between the number in the database and the official MLB total.',
    grounded_into_double_plays integer COMMENT '(GIDP) Number of times a batter grounded into a double play. This is the conventional way to record a double play and as such is an important statistic on its own. Games without play-by-play accounts don''t have data on the trajectory of the double play, so this number is not populated for those games.',
    double_plays integer COMMENT '(DP) Number of times a play ended in two outs being recorded.',
    triple_plays integer COMMENT '(TP) Number of times a play ended in three outs being recorded.',
    batting_outs integer COMMENT 'Number of outs that "should" have been recorded as a result of a batter''s plate apperances. A batting out is still counted if no actual out is recorded on an error or a failed fielder''s choice. Outs on baserunners (including the batter trying to stretch a hit) do not count here. Grounded-into-double-plays count as two outs, but other types of double plays do not (the idea here is that the baserunner is responsible on other types of double plays). Unofficial stat, but designed here to be generally useful in determining rates of official stats.',
    balls_in_play integer COMMENT 'Number of plate appearances that resulted in a live ball on the field of play. The difference between this and batted_balls is that balls_in_play does not include out-of-the-park home runs. This distinction is important for calculating batting average on balls in play, a stat designed to isolate at_bats in which the defense was involved.',
    balls_batted integer COMMENT 'Number of plate appearances that ended in a fair ball or a foul flyout. This is equivalent to balls_in_play + home_runs, and may be a more useful denominator for "in play" stats depending on the context.',
    trajectory_fly_ball integer COMMENT '(FB) Number of plate appearances that ended in a fly ball. A fly ball is defined here as the subset of balls hit in the air that are neither line drives nor pop flies.',
    trajectory_ground_ball integer COMMENT '(GB) Number of plate appearances that ended in a ground ball, also called a grounder. A ground ball is defined here to only include swings, not bunts, although older data may not have a proper distinction. Ground balls tend to be easy to distinguish between other types of contact, but Statcast-era data defines it as balls with a launch angle under 10 degrees.',
    trajectory_line_drive integer COMMENT '(LD) Number of plate appearances that ended in a line drive, also called a liner. Line drives are distiguished from fly balls by some combination of angle and exit velocity. In the Statcast era, line drives are defined purely in terms of launch angle (10-25 degrees), but just about any colloquial definition involves hard-hitness as well. The line-drive-fly-ball distinction is by far the most arbitrary and subjective trajectory categorization. Many scorekeepers never included line drives, while others counted any successful in-play air-hit as a line drive. Nevertheless, the distinction is a crucial one, as line drives are by far the most likely type of batted ball to result in a hit (even when they are not hit very hard).',
    trajectory_pop_up integer COMMENT '(sometimes PU)  Number of plate appearances that ended in a pop-up, also called a pop fly. Pop-ups are distinguished from fly balls in that they are hit at a higher angle, tend to be hit with less exit velocity, and (as a result) end up in the infield or shallow outfield. Before Statcast-era standardization, the distinction between a fly ball and a pop-up was a matter of scorerkeeper judgement, and many unofficial scorekeepers did not distinguish between the two. Statcast determines pop flies exclusively by angle (> 50 degrees).',
    trajectory_unknown integer COMMENT 'Number of plate appearances ending in a batted ball whose trajectory was not recorded and cannot be reliably deduced from context. This number includes balls that we know were hit in the air, but do not know which kind of air ball (FB/PU/LD) they were (see trajectory_broad_classification_unknown for a number that does not include those balls). The strong majority of batted balls prior to 1988 fall into this category, especially hits.',
    trajectory_known integer COMMENT 'Number of plate appearances ending in a batted ball whose trajectory was recorded or was reliably deduced from the context. An example of reliable deduction is an at-bat with the fielding play 6-3, which almost always is a ground ball fielded by the shortstop and thrown to the first baseman. See `calc_batted_ball_type` for the deduction logic.',
    trajectory_broad_air_ball integer COMMENT 'Number of plate appearances that ended in an air ball (a fly ball, line drive, or pop-up). Because it is much easier to deduce that a ball was hit in the air than it is to deduce the exact trajectory, this number field is more reliably populated than any of its three consituent parts.',
    trajectory_broad_ground_ball integer COMMENT 'Same as `trajectory_ground_ball`.',
    trajectory_broad_unknown integer COMMENT 'Number of plate appearances ending in a batted ball whose trajectory was not recorded and cannot be reliably deduced from context, even to the extent of knowing whether it was a ground ball or an air ball. This will include a disproportionate number of hits, which are more likely to be missing trajectory data and harder to make deductions about.',
    trajectory_broad_known integer COMMENT 'Number of plate appearances ending in a batted ball whose ground/air status was recorded or reliably deduced from the context. This is the sum of `trajectory_broad_air_ball` and `trajectory_broad_ground_ball`. Outs in play have excellent coverage historically here, even for older games.',
    bunts integer COMMENT 'Number of plate appearances ending in an in-play bunt. This does not include strikeouts on foul bunts with two strikes.',
    batted_distance_plate integer COMMENT 'Number of plate appearances in which the ball was batted to catcher''s area around home plate.',
    batted_distance_infield integer COMMENT 'Number of plate appearances in which the ball was batted to the infield (not including the catcher). All ground balls are included here, regardless of whether they made it through to the outfield.',
    batted_distance_outfield integer COMMENT 'Number of plate appearances in which the ball was hit on the fly to the outfield.',
    batted_distance_unknown integer COMMENT 'Number of plate appearances in which the ball was hit, but the distance was not recorded and cannot be reliably deduced from context.',
    batted_distance_known integer COMMENT 'Number of plate appearances in which the ball was hit, and the distance was either recorded or reliably deduced from context.',
    fielded_by_battery integer COMMENT 'Number of plate appearances in which the ball was fielded by the pitcher or catcher.',
    fielded_by_infielder integer COMMENT 'Number of plate appearances in which the ball was fielded by an infielder.',
    fielded_by_outfielder integer COMMENT 'Number of plate appearances in which the ball was fielded by an outfielder.',
    fielded_by_known integer COMMENT 'Number of plate appearances in which the ball was fielded by a player, and the player was recorded.',
    fielded_by_unknown integer COMMENT 'Number of plate appearances in which the ball was fielded by a player, but the player was not recorded.',
    batted_angle_left integer COMMENT 'Number of plate appearances in which the ball was batted to the left side of the field. This includes balls where the location was not recorded, but the fielder is on the left side (3B, LF). See `seed_hit_location_categories` and `seed_hit_to_fielder_categories` for more details.',
    batted_angle_right integer COMMENT 'Number of plate appearances in which the ball was batted to the right side of the field. This includes balls where the location was not recorded, but the fielder is on the right side (1B, RF). See `seed_hit_location_categories` and `seed_hit_to_fielder_categories` for more details.',
    batted_angle_middle integer COMMENT 'Number of plate appearances in which the ball was batted to the middle of the field. This includes balls where the location was not recorded, but the fielder is up the middle (P, 2B, SS, CF). See `seed_hit_location_categories` and `seed_hit_to_fielder_categories` for more details.',
    batted_angle_unknown integer COMMENT 'Number of plate appearances in which the ball was batted, but we don''t have enough location to determine the spray angle.',
    batted_angle_known integer COMMENT 'Number of plate appearances in which the ball was hit and we have enough location to determine the spray angle.',
    batted_location_plate integer COMMENT 'Number of plate appearances in which the ball was batted to the catcher''s area around home plate.',
    batted_location_right_infield integer COMMENT 'Number of plate appearances in which the ball was batted to the right side of the infield. This includes balls where the location was not recorded, but the fielder is on the right side (1B). See `seed_hit_location_categories` and `seed_hit_to_fielder_categories` for more details.',
    batted_location_middle_infield integer COMMENT 'Number of plate appearances in which the ball was batted to the middle of the infield. This includes balls where the location was not recorded, but the fielder is up the middle (P, SS, 2B). See `seed_hit_location_categories` and `seed_hit_to_fielder_categories` for more details.',
    batted_location_left_infield integer COMMENT 'Number of plate appearances in which the ball was batted to the left side of the infield. This includes balls where the location was not recorded, but the fielder is on the left side (3B). See `seed_hit_location_categories` and `seed_hit_to_fielder_categories` for more details.',
    batted_location_left_field integer COMMENT 'Number of plate appearances in which the ball was batted to the left side of the outfield. This includes balls where the location was not recorded, but the fielder is on the left side (LF). See `seed_hit_location_categories` and `seed_hit_to_fielder_categories` for more details.',
    batted_location_center_field integer COMMENT 'Number of plate appearances in which the ball was batted to the center of the outfield. This includes balls where the location was not recorded, but the fielder is up the middle (CF). See `seed_hit_location_categories` and `seed_hit_to_fielder_categories` for more details.',
    batted_location_right_field integer COMMENT 'Number of plate appearances in which the ball was batted to the right side of the outfield. This includes balls where the location was not recorded, but the fielder is on the right side (RF, CF). See `seed_hit_location_categories` and `seed_hit_to_fielder_categories` for more details.',
    batted_location_unknown integer COMMENT 'Number of plate appearances in which the ball was batted, but we don''t have enough location to determine the specific location.',
    batted_location_known integer COMMENT 'Number of plate appearances in which the ball was batted and we have enough location to determine the specific location.',
    batted_balls_pulled integer COMMENT 'Number of plate appearances in which the ball was pulled (the left side for right-handed batters, the right side for left-handed batters).',
    batted_balls_opposite_field integer COMMENT 'Number of plate appearances in which the ball was hit to the opposite field (the right side for right-handed batters, the left side for left-handed batters).',
    runs integer COMMENT '(R) Number of runs scored.',
    times_reached_base integer COMMENT 'Number of times a batter ended a plate appearance on base, even if it was through a fielder''s choice, error, etc.',
    stolen_bases integer COMMENT '(SB) Number of successful stolen bases.',
    caught_stealing integer COMMENT '(CS) Number of times a runner was caught stealing.',
    picked_off integer COMMENT '(PO, at risk of confusion with putouts) Number of times a runner was picked off.',
    picked_off_caught_stealing integer COMMENT '(POCS) Number of times a runner was picked off, but instead of going back to the bag, tried to run to the next base and was put out.',
    outs_on_basepaths integer COMMENT 'Number of outs recorded by a baserunner (this is not mutually exclusive with outs recorded by the batter in cases like failed advances or dropped third-strike putouts).',
    unforced_outs_on_basepaths integer COMMENT 'Number of outs recorded by a baserunner that was not the result of a force on the runner. "Unforced" is meant to be in both the literal sense of a force not being in play, but also the figurative sense of the runner being responsible for the out. The latter may or may not be the best interpretation of any given play, but it is useful to assign responsibility to the runner by default in those contexts.',
    outs_avoided_on_errors integer COMMENT 'Number of times that a baserunner would have been out, but an error allowed them to remain on the basepaths (either staying put or advancing).',
    advances_on_wild_pitches integer COMMENT '(WP) Number of times a baserunner advanced on a wild pitch.',
    advances_on_passed_balls integer COMMENT '(PB) Number of times a baserunner advanced on a passed ball.',
    advances_on_balks integer COMMENT '(sometimes BK) Number of times a baserunner advanced on a balk.',
    advances_on_unspecified_plays integer COMMENT 'Number of times a baserunner advanced for an unspecified reason.',
    advances_on_defensive_indifference integer COMMENT '(DI) Number of times a baserunner advanced on defensive indifference. Defensive indifference is a judgement call by the official scorer that the defense did not try to stop the runner from stealing a base. This usually happens when the defense has a lead late in the game.',
    advances_on_errors integer COMMENT 'Number of times a baserunner advanced on an error.',
    plate_appearances_while_on_base integer COMMENT 'Number of plate appearances in which the baserunner started on 1st, 2nd, or 3rd base.',
    balls_in_play_while_running integer COMMENT 'Number of balls in play while either batting or on base.',
    balls_in_play_while_on_base integer COMMENT 'Number of balls in play in which the baserunner started on 1st, 2nd, or 3rd base.',
    batter_total_bases_while_running integer COMMENT 'Number of total bases accumulated by the batter while the baserunner was running, including the batter.',
    batter_total_bases_while_on_base integer COMMENT 'Number of total bases accumulated by the batter while the baserunner was on base, excluding the batter.',
    extra_base_advance_attempts integer COMMENT 'Number of times a baserunner tried to advance by a greater number of bases than the batter.',
    bases_advanced integer COMMENT 'Number of bases advanced by a baserunner, including the batter.',
    bases_advanced_on_balls_in_play integer COMMENT 'Number of bases advanced by a baserunner on a ball in play, including the batter.',
    surplus_bases_advanced_on_balls_in_play integer COMMENT 'Number of bases advanced by a baserunner on a ball in play minus the number of total bases accumulated by the batter on the same play. For example, if a runner goes from first to third on a single, this number is 1 (3 - 2). If a runner only goes from second to third on a double, this number is -1 (1 - 2).',
    outs_on_extra_base_advance_attempts integer COMMENT 'Number of times a baserunner was out attempting to advance by a greater number of bases than the batter. This includes batters who were put out trying to stretch a hit to the next base.',
    pitches integer COMMENT 'Number of pitches thrown.',
    swings integer COMMENT 'Number of pitches that were swung at.',
    swings_with_contact integer COMMENT 'Number of pitches that were swung at and made contact.',
    strikes integer COMMENT 'Number of pitches that were called or swinging strikes.',
    strikes_called integer COMMENT 'Number of pitches that were called strikes.',
    strikes_swinging integer COMMENT 'Number of pitches that were swung on and missed (mutually exclusive with `swings_with_contact`, which also count as strikes).',
    strikes_foul integer COMMENT 'Number of pitches that were fouled off.',
    strikes_foul_tip integer COMMENT 'Number of pitches that were fouled off and caught by the catcher for strike three.',
    strikes_in_play integer COMMENT 'Number of pitches that were swung on and batted into play.',
    strikes_unknown integer COMMENT 'Number of pitches that we know were strikes, but don''t know what kind.',
    balls integer COMMENT 'Number of pitches that were called balls.',
    balls_called integer COMMENT 'Number of pitches that were called balls (as opposed to automatic balls that were not actually thrown).',
    balls_intentional integer COMMENT 'Number of pitches that were called balls as part of an intentional walk.',
    balls_automatic integer COMMENT 'Number of pitches that were called balls on an automatic walk or a delay penalty.',
    unknown_pitches integer COMMENT 'Number of pitches that were thrown without any other information recorded.',
    pitchouts integer COMMENT 'Number of pitches that were thrown as pitchouts. A pitchout is a pitch that is thrown intentionally very far outside in order to make it easier for the catcher to throw out a baserunner who is likely to steal.',
    pitcher_pickoff_attempts integer COMMENT 'Number of times the pitcher attempted to pick off a baserunner.',
    catcher_pickoff_attempts integer COMMENT 'Number of times the catcher attempted to pick off a baserunner.',
    pitches_blocked_by_catcher integer COMMENT 'Number of pitches that were blocked by the catcher.',
    pitches_with_runners_going integer COMMENT 'Number of pitches that were thrown while a baserunner was on the move (as part of a steal or hit-and-run).',
    passed_balls integer COMMENT '(PB) Number of passed balls.',
    wild_pitches integer COMMENT '(WP) Number of wild pitches.',
    balks integer COMMENT '(BK) Number of balks.',
    left_on_base integer COMMENT '(LOB) At an individual level, the number of baserunners that a batter failed to advance during a plate appearance. At a team level, the number of baserunners remaining on base at the end of an inning. In order to count, baserunners must not have scored or been put out.',
    left_on_base_with_two_outs integer COMMENT '(LOB) At an individual level, the number of baserunners that remain on base (unscored and not out) after a plate appearance that ends with the third out recorded. At a team level, this is interchangable with `left_on_base`.',
    stolen_bases_second integer COMMENT 'Number of successful steals of second base.',
    stolen_bases_third integer COMMENT 'Number of successful steals of third base.',
    stolen_bases_home integer COMMENT 'Number of successful steals of home.',
    caught_stealing_second integer COMMENT 'Number of times a runner was caught stealing second base.',
    caught_stealing_third integer COMMENT 'Number of times a runner was caught stealing third base.',
    caught_stealing_home integer COMMENT 'Number of times a runner was caught stealing home.',
    stolen_base_opportunities integer COMMENT 'Number of events in which a runner had an opportunity to steal a base as the lead basestealer OR the runner recorded a SB/CS in any situation. "Opportunity" is defined as a situation in which the next base was empty at the start of the event (not including the batter). No-play events excluded, but events without plate appearances are included.',
    stolen_base_opportunities_second integer COMMENT 'Number of opportunities to steal second base (see `stolen_base_opportunities` for detailed criteria).',
    stolen_base_opportunities_third integer COMMENT 'Number of opportunities to steal third base (see `stolen_base_opportunities` for detailed criteria).',
    stolen_base_opportunities_home integer COMMENT 'Number of opportunities to steal home (see `stolen_base_opportunities` for detailed criteria).',
    picked_off_first integer COMMENT 'Number of times a runner was picked off first base.',
    picked_off_second integer COMMENT 'Number of times a runner was picked off second base.',
    picked_off_third integer COMMENT 'Number of times a runner was picked off third base.',
    times_force_on_runner integer COMMENT 'Number of events that a force existed on the runner''s next base. The batter is counted as having a force on them (at first). No-play events excluded, but events without plate appearances are included.',
    times_lead_runner integer COMMENT 'Number of events that a runner was the lead runner on a play (the runner who is furthest along the basepaths). Batter is never counted as the lead runner. No-play events excluded, but events without plate appearances are included.',
    times_next_base_empty integer COMMENT 'Number of events that the runner''s next base was empty. The batter is counted on events where first base is empty. No-play events excluded, but events without plate appearances are included.',
    extra_base_chances integer COMMENT 'Number of hits where a runner on base had an opportunity to advance by a greater number of bases than the batter.',
    extra_bases_taken integer COMMENT 'Number of hits where a runner on base advanced by a greater number of bases than the batter.',
    batting_average double COMMENT '(AVG, BA) Hits divided by at bats. Historically speaking, the single most well-known hitting statistic. It retains much of its popularity and cultural significance today, if not its importance: the "batting title" goes to the player with the highest batting average. While batting average is rightly maligned for the limited picture it captures of a player''s offensive contribution, it''s still a nice object of study when learning about statistical inference.',
    on_base_percentage double COMMENT '(OBP) Measures a player''s ability to get on base. It is calculated as (H + BB + HBP)/(AB + BB + HBP + SF), which is confusingly a bit different from on-base events per plate appearance. We have `on_base_successes` and `on_base_opportunities` to make the OBP calculation simpler. The modern analogue to batting average, both in what it tries to measure and in its cultural significance.',
    slugging_percentage double COMMENT '(SLG) Total bases per at bat. More of an average than a percentage, but the name has stuck (though you''ll also hear "slugging average"). The simplest and most well-known measure of the "advancement factor" of a player''s offensive ability.',
    on_base_plus_slugging double COMMENT '(OPS) On-base percentage plus slugging percentage. Popularized in the 1980s, OPS is a simple way to combine the two aspects of a player''s hitting ability. In terms of its overall precision, it is easily surpassed by other metrics, but it remains hard to beat for its economy of calculation and expression.',
    isolated_power double COMMENT '(ISO) Slugging percentage minus batting average, or the average number of extra bases per at bat. An intuitive expression of a player''s raw power or ability to push runners around the bases. Less indicative of overall ability than slugging percentage, but more precise in what it tries to measure.',
    secondary_average double COMMENT '(SecA) A statistic invented by Bill James that measures a player''s ability to gain bases by means independent of batting average. Its formula is (TB - H + BB + SB - CS)/(AB). A good way to answer the question of which players would be most underrated by only looking at their batting averages. Also a good way to show how well you can understand a player''s offensive ability without taking their batting average into account.',
    batting_average_on_balls_in_play double COMMENT '(BABIP) A measure of the rate at which fieldable balls go for hits. It is calculated as (H - HR)/(AB - K - HR + SF). Pitchers are generally (but controversially) thought to have little control over balls in play, so a high BABIP is a good sign that that they have been unlucky or have played in front of a poor defense. Most pitchers do have some "true" BABIP ability that is different from the league average, but the difference is usually much smaller than the year-to-year variance in their actual BABIP. In rare cases, pitchers can have such a strong influence on balls in play that their contribution is measurable in smaller sample sizes. For pitchers in the dead-ball era and earlier, it makes less sense to treat BABIP as luck because a much higher percentage of at-bats ended in balls in play, and pitchers were more focused on getting outs by inducing weak contact without having to worry about home runs.  All of the above is also true for hitters, but to a much lesser extent, as hitters face different defenses, control their ability to beat out grounders for infield hits, and generally have more control over the quality of contact they make. BABIP can also be misleading for hitters because it excludes home runs, so Barry Bonds ends up having a lower BABIP even though there were occasional eyewitness reports of his making solid contact.  BABIP was invented by Voros McCracken around the turn of the millenium, and it remains the most prominent example of a statistic that tracks player luck as opposed to skill or performance.',
    home_run_rate double COMMENT '(HR/PA) Home runs per plate appearance.',
    walk_rate double COMMENT '(BB/PA) Walks per plate appearance.',
    strikeout_rate double COMMENT '(K/PA) Strikeouts per plate appearance.',
    stolen_base_percentage double COMMENT '(SB%) The rate of stolen base attempts that are successful. Calculated as SB/(SB + CS).',
    event_coverage_rate double COMMENT '',
    known_trajectory_rate_outs double COMMENT 'Rate of outs in play for which we know the detailed trajectory of the batted ball.',
    known_trajectory_rate_hits double COMMENT 'Rate of hits for which we know the detailed trajectory of the batted ball.',
    known_trajectory_rate double COMMENT 'Overall rate of batted balls for which we know the detailed trajectory.',
    known_trajectory_broad_rate_outs double COMMENT 'Overall rate of outs in play for which we know whether the ball was hit in the air or on the ground. This is generally very high even for the oldest play-by-play data.',
    known_trajectory_broad_rate_hits double COMMENT 'Overall rate of hits for which we know whether the ball was hit in the air or on the ground.',
    known_trajectory_broad_rate double COMMENT 'Overall rate of batted balls for which we know whether the ball was hit in the air or on the ground.',
    known_trajectory_out_hit_ratio double COMMENT 'The ratio of `known_trajectory_rate_outs` to `known_trajectory_rate_hits`. This ratio is generally very high for years without complete batted ball data, which makes it useful for estimating quantities that would be affected by the selection bias.',
    known_trajectory_broad_out_hit_ratio double COMMENT 'The ratio of `known_trajectory_broad_rate_outs` to `known_trajectory_broad_rate_hits`. This ratio is generally very high for years without complete batted ball data, which makes it useful for estimating quantities that would be affected by the selection bias.',
    air_ball_rate_outs double COMMENT 'The rate of outs in play that were fly balls, pop-ups, or line drives.',
    ground_ball_rate_outs double COMMENT 'The rate of outs in play that were ground balls.',
    ground_air_out_ratio double COMMENT 'The ratio of `ground_ball_rate_outs` to `air_ball_rate_outs`. This is a useful metric by itself because it is unaffected by the higher percentage of missing data on hits, so it is probably a more accurate measure of overall ground ball rate than `ground_ball_rate` itself for most seasons.',
    air_ball_hit_rate double COMMENT 'The rate of hits in play that were fly balls, pop-ups, or line drives.',
    ground_ball_hit_rate double COMMENT 'The rate of hits in play that were ground balls.',
    ground_air_hit_ratio double COMMENT 'The ratio of `ground_ball_hit_rate` to `air_ball_hit_rate`. Difference between this and `ground_air_out_ratio` is potentially interesting, but will be noisy for older years.',
    fly_ball_rate double COMMENT 'Of all batted_balls for which we know the trajectory, The rate that were fly balls.',
    line_drive_rate double COMMENT 'Of all batted_balls for which we know the trajectory, The rate that were line drives.',
    pop_up_rate double COMMENT 'Of all batted_balls for which we know the trajectory, The rate that were pop-ups.',
    ground_ball_rate double COMMENT 'Of all batted_balls for which we know the trajectory, The rate that were ground balls.',
    coverage_weighted_air_ball_batting_average double COMMENT 'The batting average of batted_balls that were fly balls, pop-ups, or line drives, weighted according to the `known_trajectory_broad_out_hit_ratio`. This is an attempt to measure trajectory-specific BABIP even in years where trajectory data is rarely present for hits. While it handles that specific bias, it is still likely to be noisy because of the small sample size of known-trajectory balls.',
    coverage_weighted_ground_ball_batting_average double COMMENT 'The batting average of batted_balls that were ground balls, weighted according to the `known_trajectory_broad_out_hit_ratio`. This is an attempt to measure trajectory-specific BABIP even in years where trajectory data is rarely present for hits. While it handles that specific bias, it is still likely to be noisy because of the small sample size of known-trajectory balls.',
    coverage_weighted_fly_ball_batting_average double COMMENT 'The batting average of batted_balls that were fly balls, weighted according to the `known_trajectory_out_hit_ratio`. This is an attempt to measure trajectory-specific BABIP even in years where trajectory data is rarely present for hits. It is still vulnerable to the variance and arbitrariness with which historical scorekeepers differentiated fly balls from line drives and pop-ups.',
    coverage_weighted_line_drive_batting_average double COMMENT 'The batting average of batted_balls that were line drives, weighted according to the `known_trajectory_out_hit_ratio`. This is an attempt to measure trajectory-specific BABIP even in years where trajectory data is rarely present for hits. It is still vulnerable to the variance and arbitrariness with which historical scorekeepers differentiated line drives from other air outs, which was extremely high all the way up to the Statcast era.',
    coverage_weighted_pop_up_batting_average double COMMENT 'The batting average of batted_balls that were pop-ups, weighted according to the `known_trajectory_out_hit_ratio`. This is an attempt to measure trajectory-specific BABIP even in years where trajectory data is rarely present for hits. It is still vulnerable to the variance and arbitrariness with which historical scorekeepers differentiated pop-ups from other air outs.',
    known_angle_rate_outs double COMMENT 'Rate of batted-ball outs for which we know (or have a good proxy for) whether the ball was hit to the left, right, or middle of the field.',
    known_angle_rate_hits double COMMENT 'Rate of batted-ball hits for which we know (or have a good proxy for) whether the ball was hit to the left, right, or middle of the field.',
    known_angle_rate double COMMENT 'Rate of batted balls for which we know (or have a good proxy for) whether the ball was hit to the left, right, or middle of the field.',
    known_angle_out_hit_ratio double COMMENT 'The ratio of `known_angle_rate_outs` to `known_angle_rate_hits`. This ratio is generally very high for years without complete batted ball data, which makes it useful for estimating quantities that would be affected by the selection bias. Angle is generally better known than location itself, because when a batter gets a hit, we often know which outfielder fielded the ball even though we don''t know how it got there.',
    angle_left_rate_outs double COMMENT 'The rate of batted-ball outs that were hit to the left side of the field.',
    angle_left_rate_hits double COMMENT 'The rate of hits that were hit to the left side of the field.',
    angle_left_rate double COMMENT 'The overall rate of batted balls that were hit to the left side of the field.',
    coverage_weighted_angle_left_batting_average double COMMENT 'The batting average on batted balls that were hit to the left side of the field, weighted according to the `known_angle_out_hit_ratio`. This is an attempt to measure angle-specific BABIP even in years where angle data is rarely present for hits.',
    angle_right_rate_outs double COMMENT 'The rate of batted-ball outs that were hit to the right side of the field.',
    angle_right_rate_hits double COMMENT 'The rate of hits that were hit to the right side of the field.',
    angle_right_rate double COMMENT 'The overall rate of batted balls that were hit to the right side of the field.',
    coverage_weighted_angle_right_batting_average double COMMENT 'The batting average on batted balls that were hit to the right side of the field, weighted according to the `known_angle_out_hit_ratio`. This is an attempt to measure angle-specific BABIP even in years where angle data is rarely present for hits.',
    angle_middle_rate_outs double COMMENT 'The rate of batted-ball outs that were hit to the middle of the field.',
    angle_middle_rate_hits double COMMENT 'The rate of hits that were hit to the middle of the field.',
    angle_middle_rate double COMMENT 'The overall rate of batted balls that were hit to the middle of the field.',
    coverage_weighted_angle_middle_batting_average double COMMENT 'The batting average on batted balls that were hit to the middle of the field, weighted according to the `known_angle_out_hit_ratio`. This is an attempt to measure angle-specific BABIP even in years where angle data is rarely present for hits.',
    pulled_rate_outs double COMMENT '`angle_right_rate_outs` for lefty batters and `angle_left_rate_outs` for righty batters.',
    pulled_rate_hits double COMMENT '`angle_right_rate_hits` for lefty batters and `angle_left_rate_hits` for righty batters.',
    pulled_rate double COMMENT '`angle_right_rate` for lefty batters and `angle_left_rate` for righty batters.',
    coverage_weighted_pulled_batting_average double COMMENT '`coverage_weighted_angle_right_batting_average` for lefty batters and `coverage_weighted_angle_left_batting_average` for righty batters.',
    opposite_field_rate_outs double COMMENT '`angle_left_rate_outs` for lefty batters and `angle_right_rate_outs` for righty batters.',
    opposite_field_rate_hits double COMMENT '`angle_left_rate_hits` for lefty batters and `angle_right_rate_hits` for righty batters.',
    opposite_field_rate double COMMENT '`angle_left_rate` for lefty batters and `angle_right_rate` for righty batters.',
    coverage_weighted_opposite_field_batting_average double COMMENT '`coverage_weighted_angle_left_batting_average` for lefty batters and `coverage_weighted_angle_right_batting_average` for righty batters.',
    stolen_base_attempt_rate_second double COMMENT 'The rate of stolen base opportunities taken when the runner was on first base (trying to steal second). See `stolen_base_opportunities` for the definition.',
    stolen_base_attempt_rate_third double COMMENT 'The rate of stolen base opportunities taken when the runner was on second base (trying to steal third). See `stolen_base_opportunities` for the definition.',
    stolen_base_attempt_rate_home double COMMENT 'The rate of stolen base opportunities taken when the runner was on third base (trying to steal home). See `stolen_base_opportunities` for the definition.',
    unforced_out_rate double COMMENT 'The rate of appearances on the basepaths that ended in an unforced out. See `unforced_outs_on_basepaths` for the definition.',
    pitch_strike_rate double COMMENT 'The rate of pitches that were strikes.',
    pitch_contact_rate double COMMENT 'The rate of pitches where some kind of contact was made.',
    pitch_swing_rate double COMMENT 'The rate of pitches where the batter swung (and either made contact or missed).',
    pitch_ball_rate double COMMENT 'The rate of pitches that were balls.',
    pitch_swing_and_miss_rate double COMMENT 'The rate of pitches where the batter swung and missed.',
    pitch_foul_rate double COMMENT 'The rate of pitches that were fouled off.',
    pitched_called_strike_rate double COMMENT 'The rate of pitches that were called strikes.',
    pitch_data_coverage_rate double COMMENT 'The rate of plate appearances for which we have pitch-by-pitch data.',
    PRIMARY KEY (player_id, season, league)
)
COMMENT = 'Aggregate offensive statistics and averages for each player-season,
split if the player played in multiple leagues that year. Regular season only.';