Decouple LogParser and PacketTree

jleclanche commented 8 years ago

In trying to make LogParser more usable for the HSReplay parser, I'm discovering that there are actually three distinct parts to our parser:

The Power.log parser
The Packet tree
The Entity tree

Currently, the Packet tree and the Power.log parser are tightly coupled. This is why it intuitively feels easy to make HSReplay output a packet tree, but it really isn't.

The reason they are coupled is because in order to set the entity attributes on the packets, those entities need to exist as actual entity objects, so we continuously update the entity tree as we read the log. Part of the reasoning behind that, IIRC, was to have semi-reliable Player entities that would have their name on all packets by the end of the game.

To decouple all this, here is what I propose:

Change all entity attributes on the various packets to only be integers, except for players. This will be a tough change, which will require updating the XML dumper as well.
Change Player entities to a lazy int so that it can be treated as an int, but it looks for its value in a known place.
Add a mechanism to export a packet tree (or part of a packet tree?) to an entity tree.
Remove the immediate updates on the game tree, replace it with the export
Rework LogWatcher to base itself on the packet tree's export/update mechanism

Then, as long as we can export HSReplayDocument to a packet tree, we can use LogWatcher alongside it.

jleclanche commented 8 years ago

Alright, progress, finally.

As of 8bf464c5d46cac56413b1326331f001915ae941c:

Added a LazyPlayer class: a pseudo-namedtuple with a name and an id. int(LazyPlayer) will return the id, or raise a RuntimeError if the id is not available.
PacketTree gained a manager attribute, which is a PlayerManager object. That manager allows registration of players names and IDs. The way the manager works is by keeping lookup dicts of LazyPlayer by ID, name and player ID. When the Player packet is received, we register a new player ID. Later, when we have a name available (from mulligan or the old ENTITY_ID trick), we register the name to that ID. Registration updates all tracked instances of LazyPlayer objects.
PacketTree no longer continuously updates an internal game state - that game state has been completely removed. (This is no longer needed as I removed all stateful checks)
A ton of registration hacks have been removed. Player registration now happens either by ENTITY_ID or by mulligan.
The name attribute on the CreateGame.Player packet has been removed, as it was never part of the packet in the first place. If you want the player's name, talk to the manager.

Here's what we lost:

Name re-registration (eg. when The Innkeeper or an UNKNOWN HUMAN PLAYER gets their name updated). Can easily be recovered.
CURRENT_PLAYER-based player name detection. This is harder to recover as it's stateful. Hurts quite a bit, maybe we're losing all games which don't start at the beginning. We can add _currentplayer state tracking inside of the parser (or even inside the manager), if we want this back.
Any on-the-fly state checks. LogWatcher is broken, game metadata (turn count, num entities, etc) is broken... This is step 3.

jleclanche commented 8 years ago

Exporting PacketTree to an entity tree is done. I'm starting to test the parser with existing logs now.

Something really annoying I just realized: Mulligan-based player name registration no longer works, because it relies on CONTROLLER. Urgh. We could track a simplified entity tree (one that doesn't update, just registers all entities and their controllers) but that's more complexity just for the sake of player names.

I'm thinking an easier solution is to do player name registration completely out of band. With the LazyPlayer system, we no longer need to do name registration on the fly. We used to, because we were buffering packets, which meant the parser would accumulate a bunch of packets until the player name was ready, and then flush it all out at once. Not ideal, especially for streaming

Right, so that's the other thing: As-is, we lost streaming capabilities. I think we can restore them pretty easily, there's just more work involved, as users need to maintain both the packet tree and the entity tree themselves. I'm happy not to have a helper for it (since we don't use this at all), as long as it's capable of doing it.

jleclanche commented 8 years ago

Thinking about this more, player names definitely need to be populated synchronously. Otherwise there's too much chicken-and-egg: You do an export of the game state, and that export needs player names. But if you need player names you need game state. ...

So, we'll have to track just enough for this. We'll keep an entity_id/controller map and wipe it once the player names are available.

jleclanche commented 8 years ago

Awesome!! PlayerManager is finished and working like a charm.

I reimplemented the ENTITY_ID hook using the controller lookup map. Super reliable. I also replaced the CURRENT_PLAYER hook with a LAST_CARD_PLAYED. It's less hacky, easier to follow and reuses the same controller map. A ton more reliable as CURRENT_PLAYER would not always update in the correct order. I also implemented an "aggressive" name registration hook, which, if there is a name already registered, will automatically guess which player a name belongs to without ever running any of the other hooks! It can do that because there can only be two players in a game. I'm not sure on its reliability since AI names can change throughout a game, so I've also added an ai_player attribute to the PlayerManager which we can look up.

With this, the new parser is complete, I think. I ran it over the entire hsreplay-test-data repo and I only got positive diffs (fixed bugs)!

The two diffs that jumped out:

When re-registering a player name, it uses the most up-to-date name as the final one
The aggressive name registration fixes packets which previously retained old names

I'll try to run both parsers on failed uploads from HSReplay.net and see how they both fare overall. Hoping to push it live tomorrow.

jleclanche commented 8 years ago

And with those last few commits, I am done.

Documentation of the new system:

I'll be testing it and pushing the system live tonight when it's a bit more quiet. Super proud of it.

HearthSim / python-hearthstone

Decouple LogParser and PacketTree #8