Arcana / dotabank-web

dotabank.com
8 stars 6 forks source link

GC Handler: Do some sanity-checks before spending a GC matchRequest #27

Closed rjackson closed 9 years ago

rjackson commented 9 years ago

Our GC-cluster server became unresponsive late last night and thus we had a backlog of GC requests in the matchDetails queue. Our auto-repair scripts detected a fault with those replays (because data was missing), and so it attempted to automatically repair the replay entries by re-queueing them to the matchDetails queue - which it did about 5 times (the auto-repair attempt limit).

This greatly inflated the amount of entries in the matchDetails queue, a lot of entries being duplicates, and our existing GC cluster script was not written to accomodate this, so it is spending a matchRequest (of which we only have 150 per GC account, because Valve) for every single entry in the queue. This has caused use up 1364 our of our daily 1500 matchDetails request limit (our milestone of rewriting the GC scripts with Go will go far in increasing this limit, but that's probably a few months away from done).

I have suspended the GC cluster now (with about 700 entries left in the matchDetails queue) and it will remain suspended for a few hours until I have a spare bit of time to modify the GC cluster scripts to do some checks before spending a matchDetails request - with the goal being duplicate entries in the queue won't submit duplicate matchRequests to the Dota GC.

rjackson commented 9 years ago

Sanity check are now in place (have been for 12 hours or so). Were just stuck waiting for Valves rate limit to end so we can continue processing replays.

I will clear the entire GC queue soon and requeue any unprocessed replays, so there is less junk the GC has to deal with when it starts processing replays again.

The webfronts Replay.add_gc_job method needs updating to blank the gc_done_time column in replays: this is used as an indicator for whether we have already processed a replay or not.