BurntSushi / nfldb

A library to manage and update NFL data in a relational database.
The Unlicense
1.08k stars 264 forks source link

SQL DB from NFLDB #54

Open iliketowel opened 10 years ago

iliketowel commented 10 years ago

Hi all, this is less of an "issue" and more of a question, but I'm not quite sure of the best place to put it. I created a query to include most plays/results of the games. But I wanted to confirm that I'm pulling the data the best way possible. This is the (rather long) query I've made.

You'll see that I pull from PlayPlayer, and then pull the other data from there. Is this the best place to get the most accurate information?

Thanks, Eric

SELECT P.player_id, P.profile_url, P.gsis_name, P.full_name, P."position",
CASE WHEN P.birthdate = '' THEN '1/1/1900' WHEN P.birthdate = '//' THEN '1/1/1900' Else P.birthdate End as "birthdate",
P.years_pro, G.gsis_id, G.gsis_ID || PP.play_id as "GameplayID", PP.team, PP.passing_att, PP.passing_cmp, PP.passing_yds, PP.passing_tds, PP.passing_int, PP.passing_cmp_air_yds, PP.passing_twopta, PP.passing_twoptm, PP.passing_sk, PP.passing_sk_yds, PP.passing_incmp, PP.passing_incmp_air_yds, CASE WHEN PL.description like '% left%' THEN 'Left' WHEN PL.description like '% middle%' THEN 'Middle' WHEN PL.description like '% right%' THEN 'Right' END as "play_direction", PP.receiving_tar, PP.receiving_rec, PP.receiving_yds, PP.receiving_tds, PP.receiving_yac_yds, PP.receiving_twopta, PP.receiving_twoptm, PP.rushing_att, PP.rushing_yds, PP.rushing_tds, PP.rushing_twopta, PP.rushing_twoptm, PP.rushing_loss, PP.rushing_loss_yds, G.start_time, G.week, G.day_of_week, G.season_year, G.season_type, G.home_team, G.home_score, G.away_team, G.away_score, PP.fumbles_tot, PP.fumbles_lost, PP.defense_qbhit, PP.defense_tkl_loss, PP.defense_tkl_loss_yds, PP.defense_sk, PP.defense_sk_yds, PP.defense_ffum, PP.defense_frec, PP.defense_frec_yds, PP.defense_frec_tds, PP.defense_int, PP.defense_int_tds, PP.defense_int_yds, PL.time, PL.pos_team, PL.yardline, PL.down, PL.yards_to_go, PL.first_down, PL.penalty, PL.penalty_yds, PL.description, CASE WHEN PL.description like '%(Shotgun)%' THEN 'Shotgun' END as "Shotgun", PL.Note FROM public.play_player PP, public.game G, public.player P, public.play PL WHERE PP.player_id = P.player_id AND G.gsis_id = PP.gsis_id AND G.gsis_id = PL.gsis_id AND PL.play_id = PP.play_id AND (PP.passing_att > 0 or PP.rushing_att > 0 or PP.receiving_tar > 0 or PP.passing_sk > 0 or PP.passing_twopta > 0 or PP.rushing_twopta > 0 or PL.note like 'TD' or PL.note like '%FUMBLE%' or PP.defense_qbhit > 0 or PP.defense_tkl_loss > 0 or PP.defense_tkl_loss_yds > 0 or PP.defense_sk > 0 or PP.defense_sk_yds > 0 or PP.defense_ffum > 0 or PP.defense_frec > 0 or PP.defense_frec_yds > 0 or PP.defense_frec_tds > 0 or PP.defense_int > 0 or PP.defense_int_tds > 0 or PP.defense_int_yds > 0) ORDER BY G.start_time, G.gsis_id, PL.time

BurntSushi commented 10 years ago

Could you describe your query in English? It looks like this could be much simpler if you use the nfldb.Query interface.

iliketowel commented 10 years ago

I was trying to pull in every statistic recorded, other than tackles. I was trying to tie every play to public.play_player so that I would have all recorded statistics, however based on what I've seen and played with, the public.play_player may not be 100% accurate. I'm had thought you had written before that there was another way to get more accurate statistical data.

On a football level... I have to mentally get over the Jets loss on Sunday... it was very... Jetlike.

BurntSushi commented 10 years ago

There is no way to freely access 100% accurate data. If you really need 100%, then you've got to pay Stats, Inc. for it.

I don't have a figure on the accuracy of nfldb's data, but a very rough guess is >95%.

There are some ideas I have for future work on increasing accuracy, but they will not happen any time soon.

I will try to respond again when I have more time to actually answer your question here. :-)