Closed derek-adair closed 6 years ago
maybe PULL_LATEST_SCHEDULE
should just force update_sched.py --year
JEEZ that would eliminate a lot of issue submissions.
You're not wrong, but I think the main point of nflgame
is to make usable the statistical data for the results of games that have happened/are happening.
The schedule in nflgame
exists primarily to allow nflgame
to know when a game is taking place. If it has happened in the past or present, the url for the .json data can be assembled and retrieved. If the game hasn't taken place yet, then don't retrieve the data since it doesn't exist yet.
maybe PULL_LATEST_SCHEDULE should just force update_sched.py --year
Our problem stems from the fact that the method by which we were determining <current_year>
was telling us that it was still 2017 because we were relying on (what appears to be) a deprecated NFL.com .xml page.
Really, update_schedule.py --year ####
(when working) only needs to be run 1 time throughout the season: at the very beginning of the season. It builds the full season schedule. After that, by default update_schedule.py
automatically refreshes the current week.
I think its easy to get bogged down in "what a project is and should be" at the expense of usability. At the end of the day its of my opinion that this project should provide any and all data that is available, without any manual intervention. This includes games in the future. This includes seamlessly updating game changes (i think this already happens but not sure).
Really, update_schedule.py --year #### (when working) only needs to be run 1 time throughout the season: at the very beginning of the season. It builds the full season schedule. After that, by default update_schedule.py automatically refreshes the current week.
having to run update_sched.py manually at all is the reason for these COPIOUS issues being submitted. I cant provide any pricise number but its over 100 total issues spread throughout the nfl* repos that are resulting from this lack of a user friendly design.
I know i'm not alone in that my first experience with nflgame was to try and check out the game time/data for the upcomming season. This does not work out of the box and is something I have every intention of changing going forward.
I figured this config option being set to FALSE
would be a good way to maintain backwards compatibility and not break anything that relies on it working this way.
I cant provide any pricise number but its over 100 total issues spread throughout the nfl* repos that are resulting from this lack of a user friendly design.
And how many of those are from this more recent change in seasons? NFL.com changed from underneath us. There is absolutely nothing we could have done to prevent that. If the day comes that NFL.com overhauls how they populate the game center data, that will also be a breaking change as well - a far more catastrophic one than this in terms of the future of the library. We will always be at their mercy to maintain consistency.
I'm not disagreeing that the transition from season to season (and even from preseason to regular season to post season) is inconvenient. I am simply saying that it is a difficult problem to solve. The comment "TODO: How do we know if it is the post season" has been in live.py
for almost 6 years.
I know i'm not alone in that my first experience with nflgame was to try and check out the game time/data for the upcomming season. This does not work out of the box and is something I have every intention of changing going forward.
I think you're mixing 2 issues here. Getting schedule.json
updated properly when a schedule is released and being able to display the games that have not yet happened are separate topics.
Inspecting future games is possible (as long as they exist in schedule.json
).
import nflgame
past_games = nflgame.games(2018, 1)
print 'Past games:'
for game in past_games:
print game
print '-'
future_games = nflgame.live._games_in_week(2018, 1)
print 'Future games:'
for game in future_games:
print '{} at {} on {}/{} {}'.format(game['away'], game['home'], game['month'], game['day'], game['time'])
Past games:
-
Future games:
ATL at PHI on 9/6 8:20
BUF at BAL on 9/9 1:00
PIT at CLE on 9/9 1:00
CIN at IND on 9/9 1:00
TEN at MIA on 9/9 1:00
SF at MIN on 9/9 1:00
HOU at NE on 9/9 1:00
TB at NO on 9/9 1:00
JAX at NYG on 9/9 1:00
KC at LAC on 9/9 4:05
WAS at ARI on 9/9 4:25
DAL at CAR on 9/9 4:25
SEA at DEN on 9/9 4:25
CHI at GB on 9/9 8:20
NYJ at DET on 9/10 7:10
LA at OAK on 9/10 10:20
If you're talking about making this accessible via a means other than the "private" _games_in_week
function, then by all means. I'm in total agreement. But I'm failing to see how that relates to getting the schedule .json file updated properly during season transitions.
EDIT: I need to point out, though, that a game in past_games
and a game in future_games
are not compatible objects. a past_game
is a nflgame.game.Game
with drives, and players, and stats. A future_game
is a dict
. Try to perform the same analysis on the two and you quickly run into problems.
import nflgame
past_games = nflgame.games(2018, week=[2, 3], kind='PRE')
print 'Past games:'
for game in past_games:
print game
for player in nflgame.combine_max_stats([game]):
# print player
pass
print player, player.formatted_stats() # just to show 1 for each game
print '-'
future_games = nflgame.live._games_in_week(2018, week=[2, 3], kind='PRE')
print 'Future games:'
for game in future_games:
print '{} at {} on {}/{} {}'.format(game['away'], game['home'], game['month'], game['day'], game['time'])
for player in nflgame.combine_max_stats([game]):
print player, player.formatted_stats()
Past games:
NYJ (13) at WAS (15)
C.McKinzy defense_tkl: 1, defense_tkl_loss: 1, defense_ast: 1, defense_tkl_loss_yds: 2, defense_sk: 0.0, defense_ffum: 0, defense_int: 0
PIT (34) at GB (51)
J.Frazier defense_ast: 1, defense_sk: 0.0, defense_tkl: 0, defense_ffum: 0, defense_int: 0
PHI (20) at NE (37)
B.Brown receiving_yds: 7, receiving_yac_yds: 4, receiving_tar: 1, receiving_rec: 1, receiving_tds: 0, receiving_lng: 7, receiving_twopta: 0, receiving_lngtd: 0, receiving_twoptm: 0
KC (28) at ATL (14)
J.Graham receiving_yds: 11, receiving_yac_yds: 0, receiving_tar: 1, receiving_rec: 1, receiving_tds: 0, receiving_lng: 11, receiving_twopta: 0, receiving_lngtd: 0, receiving_twoptm: 0
MIA (20) at CAR (27)
K.Anderson defense_tkl: 1, defense_sk: 0.0, defense_ffum: 0, defense_ast: 0, defense_int: 0
BUF (19) at CLE (17)
C.Boutte defense_tkl: 2, defense_ast: 1, defense_sk: 0.0, defense_ffum: 0, defense_int: 0
NYG (30) at DET (17)
T.Redding receiving_yds: 28, receiving_yac_yds: 16, receiving_tar: 2, receiving_rec: 2, receiving_tds: 0, receiving_lng: 26, receiving_twopta: 0, receiving_lngtd: 0, receiving_twoptm: 0
ARI (20) at NO (15)
G.Griffin receiving_twoptmissed: 1, receiving_twopta: 1
CIN (21) at DAL (13)
T.Beverette defense_tkl: 1, defense_sk: 0.0, defense_ffum: 0, defense_ast: 0, defense_int: 0
CHI (24) at DEN (23)
A.Simmons penalty: 1, penalty_yds: 0
SF (13) at HOU (16)
V.Bolden receiving_yac_yds: 18, kickret_yds: 32, receiving_rec: 1, receiving_yds: 32, receiving_tar: 1, kickret_ret: 1, kickret_lngtd: 0, kickret_lng: 32, receiving_tds: 0, kickret_tds: 0, receiving_lng: 32, kickret_avg: 32, receiving_twopta: 0, receiving_lngtd: 0, receiving_twoptm: 0
SEA (14) at LAC (24)
S.Richardson defense_qbhit: 1
OAK (15) at LA (19)
C.McElroy penalty: 1, penalty_yds: 6
JAX (14) at MIN (10)
T.Hoppes receiving_yds: 9, receiving_yac_yds: 5, receiving_tar: 2, receiving_rec: 1, receiving_tds: 0, receiving_lng: 9, receiving_twopta: 0, receiving_lngtd: 0, receiving_twoptm: 0
TB (30) at TEN (14)
J.Veasy receiving_tar: 1
BAL (20) at IND (19)
C.Lee penalty: 1, penalty_yds: 4
-
Future games:
NYJ at WAS on 8/16 8:00
Traceback (most recent call last):
File "P:\Projects\Home Computer\Fantasy Football\nflgame\samples\future_games.py", line 19, in <module>
for player in nflgame.combine_max_stats([game]):
File "C:\Python27\lib\site-packages\nflgame\__init__.py", line 399, in combine_max_stats
[g.max_player_stats() for g in games if g is not None])
AttributeError: 'dict' object has no attribute 'max_player_stats'
I promise that I won't continue to argue with you, but I feel like I have to comment on this:
I think its easy to get bogged down in "what a project is and should be" at the expense of usability.
The very first line of README.md (before the "unmaintained" line was added) is "nflgame is an API to retrieve and read NFL Game Center JSON data. It can work with real-time data, which can be used for fantasy football."
Games that are in the future have no game center JSON.
If you want to view upcoming games, by all means, add the functionality. But when I see your comment of "JEEZ that would eliminate a lot of issue submissions." I can't help but feel like you're throwing more than a bit of judgement on how the library was supported for the past 6+ years.
I can't help but feel like you're throwing more than a bit of judgement on how the library was supported for the past 6+ years.
Not at all. Cannot emphasize enough the appreciation I have for all of the work that's gone into this project. If anything its the users fault for not digging in like you and I did to grok what this project is set up to do and hey, RTFM... There is a TON of documentation in the code. So its a matter of insulating the management of this project from low effort and/or lazy users.
Didn't realize we were arguing ;) But please dont stop. I really appreciate your opinion on these matters. I'm just trying to make the project easier to run.
If you're talking about making this accessible via a means other than the "private" _games_in_week function, then by all means. I'm in total agreement.
Yes, but in addition, going back to your original response....
Really, update_schedule.py --year #### (when working) only needs to be run 1 time throughout the season: at the very beginning of the season. It builds the full season schedule. After that, by default update_schedule.py automatically refreshes the current week.
This is what needs to be eliminated to get rid of these types of support requests. I should never have to run update_schedule.py manually. I really cannot speak to the implications or difficulty of this task but its gotta happen.
Maybe this is really an issue about, "Expose future_games" via nflgame.games?
Right, so I wrote some peace-making stuff here but then you answered Derek and I hope that you guys are good now so I'll skip that part and jump to the other stuff ;)
My thoughts, and some answers to the thread:
Lots of posted issues regarding the schedule.
I haven't checked, but I think ochawkeye is right when saying that this peaked now because of the broken/deprecated URLs that has been used. I think however, that this has also been a thing in the past, just maybe not so many as there have been now. However, the latest schedule fix
datetime
.So, just to be clear, or rather please inform me if I'm wrong but I believe that this is now a thing of the past, e.g. indeed solved:
I am simply saying that it is a difficult problem to solve. The comment "TODO: How do we know if it is the post season" has been in live.py for almost 6 years.
If it is not solved, then I have cleary misunderstood the application when I coded the fix. Please tell me if so :)
Right, given that I am correct in that the fix does solve this problem, that should mean a decrese of issues in this matter as no user intervention is needed (no need for workarounds and no need to switch URLs).
Adding a flag/not having to run the script at all
Like I've written in an issue earlier, I like the idea of nflgame being able to run in a service/daemon/background mode. In a "background" mode, I'd like it to handle all of these things, e.g. everything that results in updated/new json/json.gz-files. Then just use the library as normal in a script/interpreter.
To have automatic schedule updates in "foreground mode"/using nflgame as a library (like now), that would still require user intervention. Like, we'd have to hi-jack into another function(s), like when querying games or so. And, that could probably be done fairly neat and with low execution cost. I'm thinking maybe just keep track of the scheduling update timestamp (could be read in the first time and then kept as a variable for the execution lifetime of the particular run) to see if we need to update the schedule, according to x number of rules that saves both calls to NFL.com and execution cost. I'll try to think of some smart way to solve this but right now it's too late in Sweden so I gotta catch some sleep.
This is what needs to be eliminated to get rid of these types of support requests. I should never have to run update_schedule.py manually.
Was thinking about this a bit more and I just want to make sure we're talking about the same thing here. If someone is using nflgame
regularly, the only time they should ever need to perform a manual update_schedule.py
is at the beginning of the season to populate the season's initial schedule.
A manual schedule update is not mandatory as each week's schedule of games is refreshed during the week they are played when import nflgame
is called (once per day).
import nflgame
> import nflgame.sched # noqa
> _create_schedule()
> nflgame.update_sched.update_week(d, year, phase, week)
> week_schedule(year, stype, week)
> update
In season, the only reason a manual schedule update would be required would be if 1) the NFL changed the schedule and 2) the user did not use nflgame
at any point during the week that the games affected by the schedule change were played.
In an active repository where completed game .json files are pushed to the repo, this issue is greatly minimized.
Your point still remains: clearly this has proven to be a non-ideal first user interaction. Knowing when to manually update the schedule can be confusing. But I do believe that this is a much smaller issue than purported to be.
@ochawkeye, part of the reasons that I want to contribute to this project is to learn more Python and I had no idea that _create_schedule()
was run just because the schedule module is imported. Thanks for that learning experience.
This part:
A manual schedule update is not mandatory as each week's schedule of games is refreshed during the week they are played when import nflgame is called (once per day).
Do you mean that a) the schedule will be updated if a user calls import nflgame
(e.g. most likely starts whatever application they have that's using nflgame
), and it will be updated max once per day cause of the if (datetime.datetime.utcnow() - last_updated).total_seconds() >= day:
code or do you mean, which I find unlikely but just gonna check it b) if you run import nflgame
and then let your application run, potentially forever, then the schedule will be updated once a day. If b) is true then I'm really missing some stuff here.
Also, since you have very good insight of the code, I'm just kindly asking again if you think that the schedule fix (#8) did indeed solve this:
I am simply saying that it is a difficult problem to solve. The comment "TODO: How do we know if it is the post season" has been in live.py for almost 6 years.
...or if you think that I missed something/that this solution won't work under condititions x,y and z. If so, please tell me and I'll se if I can figure out something.
All in all, I think that if we agree/conclude that that particular schedule issue seems fixed, that should take things back to when the old URLs worked, but with the added benefit of not having to switch URLs between reg and post. This might be a bit more user-friendly than before, possibly eliminating some issue posts in the future.
Okay, given my new amazing knowledge of import nflgame
just doing the schedule magic (this was what I meant in my prior comment by hi-jacking, didn't know it was already done - obviously) then I see stuff in a different light.
My suggestion:
schedule.json
and which week we're in/before/after, we can fill up "holes". If a user hasn't run the application since last super bowl, hasn't done a git pull
and we're now in REG4, we can tell the program to fetch PRE0..PRE4, REG1...REG4 and then of course the weeks after that point.Right, so this is what I think would be a nice tradeoff between not having to run any scripts if you don't like but at the same time being able to run them if you know about schedule changes and want them now. The suggestions above shouldn't take up too much execution time. However, checking too much ahead and downloading each remaining schedule everytime import nflgame
is run, for instance, that would take too much time IMO.
My ideal solution however, and I'm sorry that I'm mentioning it again, but it is however a separation of concerns. Like, let one process - nflgame.run()
or something like that, run in whichever way the user like, handle all updating of data in the background (this way, we can scatter for instance schedule requests over longer time periods). Let the scripts remain for those who do not want this solution but rather want to do this manually - add flags/functionality if needed. Let nflgame
when run for querying data do only that. E.g. no external calls at all, just an API towards the data that you have on disk. I do realize though, that this is a big change and that it may be advanced for a lot of users, so maybe I'll just do this on a branch on my fork and see how it works. Thanks people for reading my novel...
@ochawkeye, part of the reasons that I want to contribute to this project is to learn more Python and I had no idea that _create_schedule() was run just because the schedule module is imported. Thanks for that learning experience.
We're coming from the same place. nflgame
was/is the project that has really allowed me to sink my teeth into Python. Way more fun than some online Python tutorial! This is one that I didn't really know either. I suspected it since I don't recall ever manually updating my schedule last season. And one can see how it happens. games, last_updated = _create_schedule()
isn't a line that is nested in a if __name__ == '__main__':
so it's going to be run anytime sched.py
gets imported.
Do you mean that...
This one I'm not 100% positive on. I believe it is option "a". I added a print statement here and here and performed some random API calls and really only saw those appear once which leads me to believe the import of sched.py is the only thing triggering that.
I'm just kindly asking again if you think that the schedule fix (#8) did indeed solve this:
I have to applaud your logic, commit commenting, and code. I never even realized that the season always starts the week after Labor Day. Brilliant! The only gotcha' I anticipate is that the NFL can sometimes be inconsistent with their Super Bowl week. I know that in 2014, they referred to Super Bowl week as week 5 of the post season when all previous seasons it had been week 4.
[to be continued...have to get kid ready for school...]
[...continuation]
- Keep the once-per-day-limit checking, more often really isn't necessary.
Agreed
- After a season switch, (still, max once a day), try to get schedule data for PRE0, PRE1, REG1 (I think that these three are published at different times, could be wrong.). Like just see if we get a 404 or not. If PRE0 is present, store it. If PRE1 is present, store it and try to get PRE2, keep going like that until PRE4. Same logic with REG, and eventually POST (after last game of REG17). Will take quite some time when the schedule is released, but just checking and receiving the 404's should be a pretty fast thing.
I guess there's not a lot of harm there, especially if after failing to find PRE0 the process aborts. This seasons preseason schedule was released on April 11th. The regular season schedule was released on April 19th. (Those were their press release dates,...no idea when http://www.nfl.com/ajax/scorestrip?season=2018&seasonType=REG&week=1 showed up for the first time)
Once we have the schedules in place, use the application as today, e.g. just update the current week. See my next section on why I don't think we should check more than that.
I'm with you.
Using the timestamp of schedule.json and which week we're in/before/after, we can fill up "holes". If a user hasn't run the application since last super bowl, hasn't done a git pull and we're now in REG4, we can tell the program to fetch PRE0..PRE4, REG1...REG4 and then of course the weeks after that point.
This is where we need to be careful. Say it's REG1 and I'm developing a short script. (Assume my schedule is good for PRE0-PRE4). Every time day I execute my script (with import nflgame
at the top) am I attempting to fetch all 256 games 17 weeks that appear in REG1-REG17? Probably not a big deal, just clarifying. Maybe worth investigating if a comparison to the schedule.json
stored in the repo is newer than the local copy and pulling that one instead. Just brainstorming...
The suggestions above shouldn't take up too much execution time.
Regenerating the future schedule probably isn't even something that needs be done daily, so if we're talking a weekly execution time hit that's acceptable (within reason). But I'm more sensitive to hitting nfl.com excessively than I am to adding execution time.
My ideal solution however, and I'm sorry that I'm mentioning it again, but it is however a separation of concerns. Like, let one process - nflgame.run() or something like that, run in whichever way the user like, handle all updating of data in the background (this way, we can scatter for instance schedule requests over longer time periods). Let the scripts remain for those who do not want this solution but rather want to do this manually - add flags/functionality if needed.
I'm probably being shortsighted here, but I'm just not grasping what a constantly running .run()
would buy me that a Windows scheduled task or Linux cron job can't. My scripts loop when I want them to (NFL games are really only being played ~15 hours/week, 17 weeks per year: Thursday evenings, almost all day Sunday, and Monday nights) and go away when they're not needed.
Let nflgame when run for querying data do only that. E.g. no external calls at all, just an API towards the data that you have on disk. I do realize though, that this is a big change and that it may be advanced for a lot of users, so maybe I'll just do this on a branch on my fork and see how it works.
I'm usually of the camp that a separation of duties is a good thing, but I'll have to noodle over this one a bit. I primarily use nflgame
for live data. So for me, it's near impossible to wrap my head around separating the data retrieval from the ability to query that data. I'm gathering this data pretty much every 3 minutes every time there's at least 1 game being played in the NFL. For someone that is more interested in historical data, then I can certainly see the argument. The bulk of the data that they want to query is already in their possession and the collection of new data might not be as important to them.
Thanks people for reading my novel...
Glad to know there's a community of us that are still passionate about this wonderful tool.
Your point still remains: clearly this has proven to be a non-ideal first user interaction. Knowing when to manually update the schedule can be confusing. But I do believe that this is a much smaller issue than purported to be.
I just like to file issues and i've been a bit cranky due to my living situation / a big move coming up so... ya its really a minor issue that i was probably being dramatic about. ;)
@mickeelm You can see all of the magic that nflgame does when importing by looking @ init.py
Let me be clear that I would not want any changes to be made that would break things or slow them down. Fetching the whole season every import does seem silly. @mickeelm has the right idea. HOWEVER, i think we can possibly minimize requests sent further by;
1) assume that if PRE0 is the rest will be as well. Same with REG/POST season. This will, i beleive, populate the json files as the season transitions to at least have data for future games. 2) Each time CURRENT_WEEK changes we simply grab the next week.
The consequence would be slightly less accurate games in the future. Its very possible this is being to cautius w/ the nfl data stream, but its probably better safe than sorry.
I'm probably being shortsighted here, but I'm just not grasping what a constantly running .run() would buy me that a Windows scheduled task or Linux cron job can't.
This is another thing that relates to us insulating the project from low-level requests and makes it easier to use. Really nothing more. Its a seriously minor usability win, but these script kiddies can't open the damn source code and read Andrew's AMAZING comments.
So, thanks for all the feedback and a great thread - and thanks a lot for the kind words regarding the scheduling code. I hope that we can rely on it and that the NFL doesn't get any ideas. There are lots of stuff that you guys've written here and I'm not gonna qoute all and respond to it so here's a sum up:
Regarding run mode/background mode
I'll create a branch on my fork for this, experiment a bit. We'll see what happens. When I'm looking at how I'm gonna use this, like my vision for it, a systemd-service that updates the data and just keeps running and running will suit my setup best. I think it's worth to point out that I'm gonna run nflgame
on a 24/7 remote server, not a local computer.
Just to try and clarify what I mean (so answering your comment about live data @ochawkeye) is that I'd like the background process to update during live games (and update schedules, and players, all with "smart" rules on when to get stuff and when not to. Like, in may and june the service will just be idling, waiting for hall of fame week) and save this data (as json, in a database, somewhere). So like, if a game has started and I do this an hour into the game
>>> ìmport nflgame
nothing happens except the library being imported. No external calls - because these are already made, and are continously being made, by the background process, because the background process knows that a game is being played and now is the time to do it. Meaning, that you will have access to the latest live data at all times - just that this is being made by another process and you won't see the output of it. But once you start querying, you will get the latest.
I'm not saying that it is better though, just that it's gonna fit my purposes better :D Or, this is how my current application has been working (it uses the now deprecated score strip so that's why my interest in nflgame
is very high right now). So, I'll do this on my fork anyway and we'll se where to go from there, if anything will be synced back here.
Getting future schedule logic
Right, I think that I might have been unclear here.
So I was probably unclear regarding "filling holes" in the schedule.
This is where we need to be careful. Say it's REG1 and I'm developing a short script. (Assume my schedule is good for PRE0-PRE4). Every day I execute my script (with import nflgame at the top) am I attempting to fetch all 17 weeks that appear in REG1-REG17? Probably not a big deal, just clarifying
This is not what I meant, sorry for being unclear. I also changed my mind a bit to make it even more unclear. Summed up, I want this functionality, when update schedule is run:
schedule.json
for the current season.One last thing
Maybe worth investigating if a comparison to the schedule.json stored in the repo is newer than the local copy and pulling that one instead. Just brainstorming...
Was going to suggest exactly that before :) Don't know how it would be done but if we have good routines on keeping player, schedule and game data updated in the repo it's of course a good thing if as many users as possible can pull these changes asap, essentially limiting a lot of traffic to NFL.com.
Forgot to write, if you think that the sum-up of the scheduling logic sounds good/reasonable. I can create a new issue for it, making it a bit clearer what we actually agreed upon. I'll assign myself (if I can, I think so) if nobody else wants to take it on, and get working.
Sounds fantastic. I'm pretty swamped w/ other projects to keep the lights on at the moment, so any headway that can be made before the start of the season would probably be very advantageous to this fork.
I WILL have tim eto clean up some of the documentation as well as get them published on my personal domain. Stay tuned, probably in the next couple days...
*A very common question throughout all nfl projects is, "why can't i see the next seasons schedule??".**
A way to eliminate this is to just have nflgame provide the latest schedule given a flag
PULL_LATEST_SCHEDULE
. This would allow new users to query future seasons games without manually running update_sched.py --year 2018.I suspect this was done to prevent accident abuse of the nfl.com json stream, so some planning/research is needed to make sure this wont set off any red flags for nfl.com.