GraylinKim / sc2reader

A python library that extracts data from various Starcraft II resources to power tools and services for the SC2 community. Who doesn't want to hack on the games they play?
http://sc2reader.readthedocs.org
MIT License
413 stars 85 forks source link

UnicodeDecodeError for LotV beta replay #185

Open StoicLoofah opened 9 years ago

StoicLoofah commented 9 years ago

From this replay of Puck in a PvZ for the new LotV beta

http://lotv.spawningtool.com/4/download/

Python 3.4.0 (default, Apr 11 2014, 13:05:18) 
[GCC 4.8.2] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import sc2reader
>>> replay = sc2reader.load_replay('replays/lotv.SC2Replay')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/kevin/sc2reader/sc2reader/factories/sc2factory.py", line 85, in load_replay
    return self.load(Replay, source, options, **new_options)
  File "/home/kevin/sc2reader/sc2reader/factories/sc2factory.py", line 137, in load
    return self._load(cls, resource, filename=filename, options=options)
  File "/home/kevin/sc2reader/sc2reader/factories/sc2factory.py", line 146, in _load
    obj = cls(resource, filename=filename, factory=self, **options)
  File "/home/kevin/sc2reader/sc2reader/resources.py", line 271, in __init__
    self._read_data(data_file, self._get_reader(data_file))
  File "/home/kevin/sc2reader/sc2reader/resources.py", line 601, in _read_data
    self.raw_data[data_file] = reader(data, self)
  File "/home/kevin/sc2reader/sc2reader/readers.py", line 33, in __call__
    ) for i in range(data.read_bits(5))],
  File "/home/kevin/sc2reader/sc2reader/readers.py", line 33, in <listcomp>
    ) for i in range(data.read_bits(5))],
  File "/home/kevin/sc2reader/sc2reader/decoders.py", line 252, in read_aligned_string
    return self._buffer.read_string(count, encoding)
  File "/home/kevin/sc2reader/sc2reader/decoders.py", line 108, in read_string
    return self.read_bytes(count).decode(encoding)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xfc in position 33: invalid start byte

Currently working on fixing it, and I'll try to post a fix if I can figure things out (probably not). Just thought I would share so we could work on these things together

StoicLoofah commented 9 years ago

Alright, I am of the impression that we just need blizzard to tell us what has changed in the format to be able to fix this. Or maybe we just need someone smarter than me looking at this.

GraylinKim commented 9 years ago

I mean, it would certainly be helpful if they just told us what changed. They have no record of providing information on non-official builds though so I wouldn't hold your breath. I'll try to take a look at it later this week. If things haven't changed too dramatically I can probably spot the difference.

dsjoerg commented 9 years ago

@koalaling just posted a new s2protocol for LotV: https://github.com/Blizzard/s2protocol/commit/2c1d7fff4fadee17314a8960f21321ff6e654267

StoicLoofah commented 9 years ago

What a champ! I'll take a stab at it, though I'm not actually good at this stuff, so if you don't hear more from me, it is because I failed

GraylinKim commented 9 years ago

I've started doing work for this, you can follow my progress in the LotV branch. It isn't quite able to parse the replays yet and its not clear how many non-parsing related changes will be required to make for a good read.

You should checkout the changes in the readers file. Pretty interesting I'd say. Maybe this will also parse Heroes of the Storm replays?

EHadoux commented 9 years ago

I guess hero_talent_tree_selection_panel_toggled_event is pretty clear to me!

StoicLoofah commented 9 years ago

Wow, quick work! I'm playing around with it a bit to see what i can find.

One more thing I'll note for your TODO list is to set the expansion https://github.com/GraylinKim/sc2reader/blob/722ae4f16430c023bafa32fa69a1773909b1f867/sc2reader/resources.py#L344

Thanks for your continued effort to maintain this library!

StoicLoofah commented 9 years ago

So this probably won't make you feel really comfortable, but I'm using this branch in production right now since I want to run more replays through it. So far, so good! I'll report any issues I come up against as they pop up.

For others who are building on top of sc2reader, you can check out my branch of spawningtool where I'm going through the types that blizzard is using

https://github.com/StoicLoofah/spawningtool/tree/32-lotv

StoicLoofah commented 9 years ago

I haven't dug much into it, but I got an archon mode replay that isn't quite working correctly

http://lotv.spawningtool.com/81/download/

As a sample of what this looks like:

>>> replay.players
[Player 1 - Amshel (Zerg), Player 2 - DrennoC (Zerg), Player 3 - Filleriste (Terran), Player 4 - Sudhish (Terran)]
>>> replay.players[0].units[-10:]
[Zergling [A240005], Zergling [9540004], Zergling [C840004], Zergling [1BC0007], Zergling [A440004], Zergling [C380004], Zergling [C100007], Zergling [C040002], Zergling [BA00005], Larva [4840002]]
>>> replay.players[1].units[-10:]
[Marine [3980003], Marine [6040003], Marauder [A4C0004], Marine [D00003], Marine [AE00004], Marine [A240004], Marauder [A540009], Marine [AA40003], Marine [AAC0002], Marine [AE80006]]

So the 2 zerg players are on the first team, but the units are associated with those 2 players rather than with the players on the 2 teams respectively.

This possibly should be addressed separately, and I haven't looked deeply into why this is happening, though my gut says that blizzard got this working through some creative indexing within the lobby that is throwing us off. I am as useless as ever in actually understanding how to fix these things but am happy to dig more or provide better reporting as desired

EHadoux commented 9 years ago

It may seems obvious but what is the size of the players list? If it is 4 it's awkward but if it's 2, the two players of each team may have been merged. In the former case, is players[0].units equals to players[2].units? I cannot test it myself ATM

StoicLoofah commented 9 years ago

Another fun fact that isn't too systemic in sc2reader but is probably a big assumption for others is that there are 16 frames per second. One change was that Blizzard slowed down the game clock so that it matches real-time. I verified the differences by comparing unit build times between

https://docs.google.com/spreadsheets/d/1JtL6Wd9q5Qxm3KEewnOXSz1KxXp6d6OF6KmrgO183Lw/edit#gid=410665689 https://github.com/StoicLoofah/spawningtool/blob/43c2ca0574c9dac3ce365853419a634962afa2ff/spawningtool/constants.py

Doing the math, it seems like they sped it up by roughly 1.38, which naively is around 22 frames per second. Of course, I have no idea whether they actually implemented it that way since I haven't easily been able to find a replay and matching VOD (and don't have beta access to test it), but that may be something we have to integrate into our thinking as well.

StoicLoofah commented 9 years ago

@EHadoux currently the size of players is 4

EHadoux commented 9 years ago

I'm pretty sure your maths are right as 1.38 was the speed factor between real time and fastest speed (is that the right name? I have the game in French) AFAIK. I watched some streams and it seems like the clock went slower.

GraylinKim commented 9 years ago

Okay @StoicLoofah I pushed a couple small changes. A fixed an attribute mapping that was causing parse failure and fixed the replay.expansion attribute. Please forward all errors that pop into your logs to my email so I can diagnose.

If you guys can fill-in the correct values for this table we can fix the game timings for LotV as well:

GAME_SPEED_FACTOR = {
    'Slower':   0.6,
    'Slow':     0.8,
    'Normal':   1.0,
    'Fast':     1.2,
    'Faster':   1.4
}

It seems that I have a choice between looking into archon mode, adding data for the new units/abilities, and wrapping data for the new events. Anyone have a preference?

GraylinKim commented 9 years ago

If anyone wants to help out right now they can:

1) Send me replays 2) Do some research on the new initData flags (fb285c6). Which game settings on bnet flip which flags, if any?

I am guessing that the tandem_leader_user_id is related to archon mode. I am hoping the other flags are used so that we can better differentiate game modes in replays now.

EHadoux commented 9 years ago

Here is the table for game speed: http://wiki.teamliquid.net/starcraft2/Game_Speed However, it reads it can sometimes be a tiny bit different.

Speed Time Factor
Slower 0.599
Slow 0.83
Normal 1
Fast 1.21
Faster 1.38
StoicLoofah commented 9 years ago

@EHadoux 's numbers for game speed look right. I don't really have a good sense for what Blizzard did to get the time adjusted. It would seems strange to me that there should be a non-integer number of frames per second, and it could be a pain to deal with, but I don't have a good way to verify that externally.

@GraylinKim I have not yet fully tested out this branch, but at least the last batch of commits appeared to have been stable in production, so no errors there, yet!

For how you spend your time, archon mode is probably the most applicable for me, but I can see the value of looking into the new events as well. For the data for new units/abilities, that might be too much in flux to spend too much time there. But of course, do what works best for you.

Looking at the new initData flags, I actually think those might be for Heroes of the Storm. "practice" and "cooperative" are definitely in Heroes, and the other 3 could map to other modes (amm = quick match, ranked = hero league, competitive = team league?). I couldn't say, but since all LotV games are unranked or custom, I doubt we will see variance there.

Most of the replays I have are probably uninteresting (since they all work), but if you like, I can send you a zip or list of URLs or whatever else works for you. Let me know.

Again, I really appreciate that you continue to maintain this package!

EHadoux commented 9 years ago

Those ratios seem to be empirical from the MULE. After reading the liquipedia page and this (http://us.battle.net/sc2/en/forum/topic/628076627) one, the theoretical ratios are more likely to be 0.6x, 0.8x, 1.0x, 1.2x and 1.4x. With those figures, seconds are round for every speed.

StoicLoofah commented 9 years ago

Just ran through the latest version of the code, and I ran into some issues with TrackerEvents not having units associated with them. For example, there were dicts that looked like this

{'upkeep_pid': 1, 'unit_id_recycle': 1, 'unit_upkeeper': Player 1 - Pseudorandom (Zerg), 'unit_type_
name': 'Drone', 'location': (158, 30), 'control_pid': 1, 'second': 19, 'unit_id_index': 241, 'unit_controller': Player 1 - Pseudorandom (Zerg), 'x': 158, 'name': 'UnitBornEvent', 'unit_id': 63176705, 'unit': None, 'frame': 305, 'y': 30}

that presumably should have had a unit, but did not. This particular one manifested in http://lotv.spawningtool.com/5/download/ and I saw it on UnitBornEvent, UnitInitEvent, and UnitTypeChangeEvent.

Also relevant is that this was working as of 5da22b473b8a77390d825a75e9963cd2342cce26 . I didn't see anything in the changes since then that would indicate what the problem was. My only guess is that if you tie the types to the expansions, then maybe the fix is to generate that data?

Now that I read my comment, I'm getting the suspicion that this is exactly what was on your TODO list and that you know all of this. I completely accept my ignorance should this be the case

StoicLoofah commented 9 years ago

@GraylinKim : I'm going to continue to be "that guy". I am running into a regression on this branch compared to master: as of a few days ago, it looks like HotS replays aren't parsing correctly on the lotv branch. For example,

http://spawningtool.com/22312/download/

>>> import sc2reader
>>> sc2reader.load_replay('replays/fightnight.SC2Replay')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/kevin/sc2reader/sc2reader/factories/sc2factory.py", line 85, in load_replay
    return self.load(Replay, source, options, **new_options)
  File "/home/kevin/sc2reader/sc2reader/factories/sc2factory.py", line 137, in load
    return self._load(cls, resource, filename=filename, options=options)
  File "/home/kevin/sc2reader/sc2reader/factories/sc2factory.py", line 146, in _load
    obj = cls(resource, filename=filename, factory=self, **options)
  File "/home/kevin/sc2reader/sc2reader/resources.py", line 271, in __init__
    self._read_data(data_file, self._get_reader(data_file))
  File "/home/kevin/sc2reader/sc2reader/resources.py", line 602, in _read_data
    self.raw_data[data_file] = reader(data, self)
  File "/home/kevin/sc2reader/sc2reader/readers.py", line 174, in __call__
    ) for p in details[0]],
KeyError: 10

I tried a few replays from the past few days, nad they're exhibiting the same behavior but are otherwise working fine on master.

This is present on the HEAD of lotv, but as a heads up, I have been using 5da22b4 in production because of the issues with unit types as noted above.

Also let me know if you want to talk through this offline. I wish I could be more helpful and independent here in addressing these, but the fact is that I'm just not knowledgeable enough to do that.

StoicLoofah commented 9 years ago

Here's was the patch I have for it locally. It seems like it works, but I question everything

diff --git a/sc2reader/readers.py b/sc2reader/readers.py
index b922dd6..ac99933 100644
--- a/sc2reader/readers.py
+++ b/sc2reader/readers.py
@@ -170,7 +170,7 @@ class DetailsReader(object):
                 observe=p[7],
                 result=p[8],
                 working_set_slot=p[9] if replay.build >= 24764 else None,
-                hero=p[10] if replay.build >= 34784 else None,
+                hero=p[10] if replay.build >= 34784 and 10 in p else None,
             ) for p in details[0]],
             map_name=details[1].decode('utf8'),
             difficulty=details[2],
GraylinKim commented 9 years ago

That patch looks fine to me, I'll apply it when I get home tonight. I'm sure the issue is that "hero" isn't written out unless you are playing Heroes right now. Interesting to see a divergence in what is written based on the game though.

re: The unit being missing you are exactly right. All unit types and therefore units are tied to a datapack. When I fixed the expansion to read LotV it could no longer find a datapack and no longer resolve units. In the past I generated the datapack automatically by scrapping the RAM of the running game. Since I don't have beta access I can't do that now. It would be possible to hand-build that datapack if you wanted to take on that task, it is just a simple CSV format mapping in-game codes to names.

GraylinKim commented 9 years ago

To restore the previous behavior I could just copy the HotS datapacks into LotV and write the glue code. There are certainly problems with that approach but maybe it is better than nothing for now.

StoicLoofah commented 9 years ago

@GraylinKim That would be a good solution for now. If you're busy, I'm happy to take a shot at it myself, though I may not get around to it this weekend. I at least have taken a look at the types in the process of putting together

https://github.com/StoicLoofah/spawningtool/blob/32-lotv/spawningtool/lotv_constants.py

You will see a PR from me if I manage to get something working

StoicLoofah commented 9 years ago

Anyone have any thoughts or any success in handling archon mode? I see they added that new archon_leader_id, but I'm not sure how that would get integrated into the system. Again, I would tentatively be interested in helping out, but I have no idea where to start

GraylinKim commented 9 years ago

I'd have to look into things to see how it was laid out, but my best guess is as follows:

If I am right, it is just a matter of pulling all the information for the teams from the archon_leader_id players. Not much else should need to change for most use cases.

I haven't looked into it myself yet though so this is all guess work. If you could poke around the events for an archon game a little bit and confirm or refute that would be a good start.

StoicLoofah commented 9 years ago

Sorry for being slow to respond here. I took a look at another archon mode replay, and the situation is roughly the same as discussed in https://github.com/GraylinKim/sc2reader/issues/185#issuecomment-89498669

The biggest issue is that the units are associated with the wrong users: they're associated with sid 0 and 1, but they should be associated with sid 0 and 2. It looks like the teams otherwise are properly configured.

I wonder if it might be related to https://github.com/GraylinKim/sc2reader/issues/176 where we're not mapping lobby slots into in-game ids correctly.

Do you have any suggestions for where I might start digging into this?

StoicLoofah commented 9 years ago

There's a new protocol out https://github.com/Blizzard/s2protocol/commit/2827bad0986c791201423603dac61ea29a3c3886

And here's an example of a replay http://lotv.spawningtool.com/903/download/

@GraylinKim I would be willing to help out for these things when they come up, but i literally have no idea what I'm looking at. If it's easy for you to do and you don't mind, that's cool. If you want to offload it, let me know

StoicLoofah commented 9 years ago

Anyone have any other thoughts on how we can start tackling archon mode? I have been getting more requests to support it, and I'm still a little lost on where to start with it

EHadoux commented 9 years ago

This is out of my knowledge. However, I can tell that, even on Blizzard's side, it's not really clear. I mean, in game I always have toggled the button to be green and all the enemies in red (the second or third one on the right of the minimap). However, when doing that, my enemies are red but so do I. Only my friend is in the right color. It is almost (?) always the same problem for the colors in the graph screen after each game. That is to say that there are still some bugs so I don't know if it really the worth the time as everything may change anytime.