Clidus / gwl

Video game collection, wishlist and backlog tracker.
https://www.gamingwithlemons.com/
MIT License
18 stars 6 forks source link

General Database "Refresh" Question #95

Closed crunchprank closed 8 years ago

crunchprank commented 8 years ago

Heyo @Clidus! I just had a quick question regarding how the local database works when there is a change to a game on GiantBomb's end.

Example, I add a game I want to my collection and at the time of adding it there's no "boxart" image so none gets added to my site / database. However a couple months down the road, an image gets added to GiantBomb and I want that image added to my database for that game.

Is this done automatically? Is there a manual refresh option? Or once a game gets cataloged in the local database is that how it stays?

Thanks!

PS: Any word on usernames in the URL instead of their ID number?

Clidus commented 8 years ago

Hi @crunchprank. Data syncing is a very good question.

Calling http://www.gamingwithlemons.com/cron/update will update the oldest game in the database based on the LastUpdated field in the games table. An error will be recorded in the Error field if the update failed.

There is a python script in the project that can be scheduled in cron: https://github.com/Clidus/gwl/blob/master/cron/updateGameCache.py

All the python script does is make the above web request, so you could do this any way you wanted (a batch file and windows task scheduler for example). On the production server I have this run every five minutes so that the database is (very slowly) refreshed. I do it very slowly as Giant Bomb don't like you to scrape them and refreshing the data is a low priority task. Feel free to run it more often, but you do so at your own risk of being blocked.

Clidus commented 8 years ago

Regarding usernames in urls, no update yet! My current priorities are to reduce the dependency on Giant Bomb (making good progress on this) and to sync with Steam.

crunchprank commented 8 years ago

Awesome! Thanks for that detailed explanation. Very cool to see you already have a solution in place too. I personally won't need to run it much at all considering right now the platform is just being used by myself and a couple of friends so I'll probably set up a cronjob just on a weekly basis.

Glad to hear you're making progress on not being so reliant on Giant Bomb. Thanks again for your help and effort in this project!

Clidus commented 8 years ago

No problem!

It's worth noting that a reason you could get an error recorded against a game (beside the API being down) is because GB has deleted it. This has happened a couple of times when duplicates are found ( the flash and steam versions of The Stanley Parable were initially separate titles for example).

This doesn't come up very often, so there is no automated process for dealing with it. Currently I manually change the game in peoples collection to the correct version and manually delete the duplicate from the GWL database.

Clidus commented 8 years ago

There have been some significant changes to how this all works in the latest version. I've documented it here: https://github.com/Clidus/gwl/wiki/Giant-Bomb-database-cache-and-updater

crunchprank commented 8 years ago

Heyo @Clidus again, sorry to post in a closed ticket but didn't want to open a new one just for this.

Anyway, I've had the cronjob running now and am curious if it's actually updating the database. One thing that makes me wonder is the fact that there's no information being outputted to output_logs.op, whether I run the updateGameCache.py manually or in a cronjob. Another thing that makes me wonder is the fact that when I go to /cron/update in the browser and it states something like "Next offset: 2100". I would think that if I run that in the browser and get 2100, then run the python script, then go back and run it in the browser, the next offset would be 2300, but instead it's 2200 - hopefully that makes sense. Also shouldn't I see some new entries in the apiLog table every time this script runs?

Thanks for the help!

Clidus commented 8 years ago

Hi @crunchprank. You are correct, if you hit /cron/update and see 2100, run the python script and then hit the page again, it should be at 2300. If you're getting 2200 it suggests that the python script isn't actually executing.

Regarding the apiLog table, does anything appear in it when you hit /cron/update directly? Also, may I ask which version of GWL you are running?

Clidus commented 8 years ago

Also did you modify the python script? It just occurred to me that it's currently hard coded to hit gamingwithlemons.com :sheep:

crunchprank commented 8 years ago

Welp. Yeah. My brain doesn't work sometimes.

The problem was indeed the fact I left gamingwithlemons in the script. So, totally my fault haha. Thank you for pointing that out.

Another problem I ran into that I only found out after correcting the script to reflect my /cron/update was the cronjob wasn't working. This isn't a universal problem, and differs per OS / user configuration, but cron doesn't have an environment so it doesn't have a good PATH variable - so I just needed to change python3 to /usr/local/bin/python3 and everything was fine. Again this just depends on the user's OS environment, so just python3 may very well work for others. But thought I'd mention it in case anyone else ran into the same problem.

So yeah, we're all good. Just me not using my head like I should haha. Thank you!

Edit: Also, I'm running the latest GWL as of last night. I had to insert a few additions to the database, but other than that everything went fine. The apiLog tables are being populated as they should. However I'm running into a problem where just that table is almost hitting 1GB now so I just ran a TRUNCATE on it to clean it up a bit. But that's another thing entirely that I may open an issue on.

Clidus commented 8 years ago

Glad you solved the problem. The hard coded domain is my fault really. I will have to re-think that, or at least document that this needs to be changed.

Regarding the size of the apiLog table, you need to run a second cron job which processed the log. I documented how it works over on the wiki: https://github.com/Clidus/gwl/wiki/Giant-Bomb-database-cache-and-updater

Let me know if you have any questions :)

crunchprank commented 8 years ago

I totally missed the processAPILog.py. You're awesome! Thanks.

Clidus commented 8 years ago

No problem :+1: