Switch to an event loop

GraylinKim commented 11 years ago

Processing right now performed by registering a series of plugin's that run over the replay sequentially. In an event loop, processing would happen by registering callbacks to different event types and looping through the callbacks on each event before moving to the next.

I can see a few reasons why this is better:

It is much faster to run through callbacks like this than have each plugin do its own event loop/filtering.
It enables plugins to pass messages by inserting new events into the event loop. This opens up the possibility that high level events like NewExpansion can be raised for other downstream plugins to digest.
The load_context methods on events could be factored out into a initially enabled plugin. This would create a stronger divide between the code that loads and processes the replays.

So we're doing it. People that don't like it can pipe the replay through the existing plugins themselves.

dsjoerg commented 11 years ago

One wrinkle to consider when architecting this.

Consider a plugin A that reads through the entire replay and then sets some variables on various objects. And then a plugin B that also reads through the entire replay and uses the variables set by plugin A.

Rewriting those plugins to use the event loop may be impossible, in particular if the values that plugin B needs are based on values that are only available when plugin A is done consuming all the events of the replay.

It's effectively a look-forward dependency.

You could response to this challenge in one of three ways:

Punt! Plugins that work as described above could continue to just process all the events themselves
During plugin registration, there could be an "event loop level". Everything on level 0 gets run together, and then if there's a level 1, then everything on level 1 gets run together, and so on.
When one plugin depends on another, its inbound events might queue up until its dependent plugin says that it is done processing. This might be a memory hog though.

This is not a hypothetical problem; I believe that BaseTracker and the upcoming ProtossMacroTracker have this A/B relationship. BaseTracker uses last_activity, which is not known until the replay is done processing. But ProtossMacroTracker would be consuming events in order.

When you say the new approach would be much faster, it would be helpful to put that in context by measuring what % of total replay processing time is taken up by plugins. I don't have good benchmarks for that, but before pouring a bunch of time into this (at least for ggtracker's sake), it would be important to know the theoretical max improvement in processing time. We're not using pypy yet on the server, and that would be a ~2x speedup for maybe 1 day's work messing with sysadmin, because python processing is only about 35% of the overall ggtracker processing latency.

dsjoerg commented 11 years ago

All of that said, from a pure software aesthetics perspective I endorse this approach.

GraylinKim commented 11 years ago

I wouldn't suggest doing this solely on the grounds of performance gains. I expect we'll pick up between 15-25% on the processing time which for lager replays (4v4s and pros with 400 APM) can be over half of the total time to load a replay. I used to have some numbers but I can't seem to find them at the moment.

Regarding the look-ahead dependency, I agree that it would be a problem but I think that it can be mitigated in practice. If we assume that plugins have a setup and cleanup hook called for pre and post processing then any look-ahead dependencies could be deferred into the post processing step (where clean-ups would be called in registration order) while all interactive elements of the processing can happen inside the event loop.

I prototyped such an engine a few months back. Just made a gist for it.

GraylinKim commented 11 years ago

This is almost done. Just need to document and do backwards compatibility checks.

GraylinKim commented 11 years ago

Done!

dsjoerg commented 11 years ago

Did you have a chance to check the performance impact? I presume it would be neutral or positive, just curious.

On Mon, Aug 12, 2013 at 11:34 AM, Graylin Kim notifications@github.comwrote:

Done!

— Reply to this email directly or view it on GitHubhttps://github.com/GraylinKim/sc2reader/issues/111#issuecomment-22501987 .

GraylinKim commented 11 years ago

I'll run some tests for python 2.7, 3.2, and pypy. Hopefully tonight.

dsjoerg commented 11 years ago

Also, does this mean as far as you're aware we can safely switch ggpyjobs from the upgraded branch to the master branch? It looks like it to me, just confirming that you agree.

On Mon, Aug 12, 2013 at 12:02 PM, David Joerg dsjoerg@gmail.com wrote:

Did you have a chance to check the performance impact? I presume it would be neutral or positive, just curious.

On Mon, Aug 12, 2013 at 11:34 AM, Graylin Kim notifications@github.comwrote:

Done!

— Reply to this email directly or view it on GitHubhttps://github.com/GraylinKim/sc2reader/issues/111#issuecomment-22501987 .

GraylinKim commented 11 years ago

Yeah, I already did it. ^^

dsjoerg commented 11 years ago

Not sure what you mean by "I already did it".

If you look at https://github.com/ggtracker/esdb/tree/master/vendor you'll see it still refers to the upgraded branch of ggpyjobs.

I could use a refresher on what you mean by the highlighted text below.

*ggtracker/ggpyjobs

* I also upgraded ggtracker/ggpyjobs this morning. I also moved this

repository back to master.

Just not sure what that actually means in concrete terms. Is there some official branch registered somewhere?

On Mon, Aug 12, 2013 at 12:11 PM, Graylin Kim notifications@github.comwrote:

Yeah, I already did it. ^^

— Reply to this email directly or view it on GitHubhttps://github.com/GraylinKim/sc2reader/issues/111#issuecomment-22504760 .

GraylinKim commented 11 years ago

We should do this in email, not on the ticket. I'll respond there.

GraylinKim / sc2reader

Switch to an event loop #111

repository back to master.