Closed GraylinKim closed 11 years ago
One wrinkle to consider when architecting this.
Consider a plugin A that reads through the entire replay and then sets some variables on various objects. And then a plugin B that also reads through the entire replay and uses the variables set by plugin A.
Rewriting those plugins to use the event loop may be impossible, in particular if the values that plugin B needs are based on values that are only available when plugin A is done consuming all the events of the replay.
It's effectively a look-forward dependency.
You could response to this challenge in one of three ways:
This is not a hypothetical problem; I believe that BaseTracker and the upcoming ProtossMacroTracker have this A/B relationship. BaseTracker uses last_activity, which is not known until the replay is done processing. But ProtossMacroTracker would be consuming events in order.
When you say the new approach would be much faster, it would be helpful to put that in context by measuring what % of total replay processing time is taken up by plugins. I don't have good benchmarks for that, but before pouring a bunch of time into this (at least for ggtracker's sake), it would be important to know the theoretical max improvement in processing time. We're not using pypy yet on the server, and that would be a ~2x speedup for maybe 1 day's work messing with sysadmin, because python processing is only about 35% of the overall ggtracker processing latency.
All of that said, from a pure software aesthetics perspective I endorse this approach.
I wouldn't suggest doing this solely on the grounds of performance gains. I expect we'll pick up between 15-25% on the processing time which for lager replays (4v4s and pros with 400 APM) can be over half of the total time to load a replay. I used to have some numbers but I can't seem to find them at the moment.
Regarding the look-ahead dependency, I agree that it would be a problem but I think that it can be mitigated in practice. If we assume that plugins have a setup
and cleanup
hook called for pre and post processing then any look-ahead dependencies could be deferred into the post processing step (where clean-ups would be called in registration order) while all interactive elements of the processing can happen inside the event loop.
I prototyped such an engine a few months back. Just made a gist for it.
This is almost done. Just need to document and do backwards compatibility checks.
Done!
Did you have a chance to check the performance impact? I presume it would be neutral or positive, just curious.
On Mon, Aug 12, 2013 at 11:34 AM, Graylin Kim notifications@github.comwrote:
Done!
— Reply to this email directly or view it on GitHubhttps://github.com/GraylinKim/sc2reader/issues/111#issuecomment-22501987 .
I'll run some tests for python 2.7, 3.2, and pypy. Hopefully tonight.
Also, does this mean as far as you're aware we can safely switch ggpyjobs from the upgraded branch to the master branch? It looks like it to me, just confirming that you agree.
On Mon, Aug 12, 2013 at 12:02 PM, David Joerg dsjoerg@gmail.com wrote:
Did you have a chance to check the performance impact? I presume it would be neutral or positive, just curious.
On Mon, Aug 12, 2013 at 11:34 AM, Graylin Kim notifications@github.comwrote:
Done!
— Reply to this email directly or view it on GitHubhttps://github.com/GraylinKim/sc2reader/issues/111#issuecomment-22501987 .
Yeah, I already did it. ^^
Not sure what you mean by "I already did it".
If you look at https://github.com/ggtracker/esdb/tree/master/vendor you'll see it still refers to the upgraded branch of ggpyjobs.
I could use a refresher on what you mean by the highlighted text below.
*ggtracker/ggpyjobs
* I also upgraded ggtracker/ggpyjobs this morning. I also moved this
Just not sure what that actually means in concrete terms. Is there some official branch registered somewhere?
On Mon, Aug 12, 2013 at 12:11 PM, Graylin Kim notifications@github.comwrote:
Yeah, I already did it. ^^
— Reply to this email directly or view it on GitHubhttps://github.com/GraylinKim/sc2reader/issues/111#issuecomment-22504760 .
We should do this in email, not on the ticket. I'll respond there.
Processing right now performed by registering a series of plugin's that run over the replay sequentially. In an event loop, processing would happen by registering callbacks to different event types and looping through the callbacks on each event before moving to the next.
I can see a few reasons why this is better:
load_context
methods on events could be factored out into a initially enabled plugin. This would create a stronger divide between the code that loads and processes the replays.So we're doing it. People that don't like it can pipe the replay through the existing plugins themselves.