ggtracker / sc2reader

Extracts gameplay information from Starcraft II replay files
http://pypi.python.org/pypi/sc2reader
MIT License
145 stars 144 forks source link

AttributeError when trying to load replay #61

Closed stefanpantic closed 5 years ago

stefanpantic commented 6 years ago

I get this error whenever I try to load a Replay. I tried reinstalling and installing directly from GitHub but the error persists. Any ideas on what could be causing this?

  File "<stdin>", line 1, in <module>
  File "/home/stefan/Documents/ML/env/lib/python3.6/site-packages/sc2reader/factories/sc2factory.py", line 85, in load_replay
    return self.load(Replay, source, options, **new_options)
  File "/home/stefan/Documents/ML/env/lib/python3.6/site-packages/sc2reader/factories/sc2factory.py", line 137, in load
    return self._load(cls, resource, filename=filename, options=options)
  File "/home/stefan/Documents/ML/env/lib/python3.6/site-packages/sc2reader/factories/sc2factory.py", line 146, in _load
    obj = cls(resource, filename=filename, factory=self, **options)
  File "/home/stefan/Documents/ML/env/lib/python3.6/site-packages/sc2reader/resources.py", line 312, in __init__
    engine.run(self)
  File "/home/stefan/Documents/ML/env/lib/python3.6/site-packages/sc2reader/engine/engine.py", line 177, in run
    for new_event in (event_handler(event, replay) or []):
  File "/home/stefan/Documents/ML/env/lib/python3.6/site-packages/sc2reader/engine/plugins/context.py", line 57, in handleTargetUnitCommandEvent
    self.last_target_ability_event[event.player.pid] = event
AttributeError: 'NoneType' object has no attribute 'pid'
dsjoerg commented 6 years ago

Is this a standard ladder replay? If not, that could be the problem. I'm sure there are types of replays that sc2reader needs to be tweaked to be able to handle.

StoicLoofah commented 6 years ago

Yeah, I have run into issues like this before, though it does tend to be replay-specific. Can you provide a download link for an example of a problematic replay?

stefanpantic commented 6 years ago

I've tried several from here and they all gave the same error. http://blzdistsc2-a.akamaihd.net/ReplayPacks/4.1.2.60604-20171220_014559-1.zip

I forgot to mention my client version (4.1.2), and Python version (3.6.5).

StoicLoofah commented 6 years ago

@stefanpantic I tried opening the file linked, and it's requesting a password to open the zip file. Any help on that?

stefanpantic commented 6 years ago

My bad i forgot to include that. It's"iagreetotheeula".

StoicLoofah commented 6 years ago

Okay, so I was able to diagnose where we're not getting the data we need. However, I have no idea how to fix it.

On the replay that I checked, the event.player == None, but event.pid == 1. When I checked, both replay.player and replay.players were empty.

Those are supposed to be populated here https://github.com/ggtracker/sc2reader/blob/cc364ffa7ed675cf907e1d62b1bb6e9497007401/sc2reader/resources.py#L384-L386

but replay.details does not appear in self.raw_data.

that is supposed to be loaded at https://github.com/ggtracker/sc2reader/blob/cc364ffa7ed675cf907e1d62b1bb6e9497007401/sc2reader/resources.py#L274

which ends up invoking this https://github.com/ggtracker/sc2reader/blob/cc364ffa7ed675cf907e1d62b1bb6e9497007401/sc2reader/utils.py#L138-L139

and there, file_data == None. It doesn't even hit the exception, though when I checked recovery_attempt() == None.

Note that there is real data here in replays that parse correctly (though it did hit the recovery_attempt case on the replay I tried), so I think this is the gap.

This is far beyond the parts of sc2reader that I have ever had to deal with, so I really have no idea what's going on here. Calling on @GraylinKim to see if he has any memory of what's going on here. Otherwise, I'm not sure where to go from here =(

cclauss commented 6 years ago

except Exception as e: is a bad practice so, would it be possible to replace Exception with ImportError (not likely to happen here) and then do a few trial runs and catch each exception that actually occurs in its own except block?

try:
    file_data = archive.read_file(data_file, force_decompress=True)
except AError as e:
    file_data = recovery_attempt()
    if file_data is None:
        raise
except BError as e:
    file_data = recovery_attempt()
    if file_data is None:
        raise
except CError as e:
    file_data = recovery_attempt()
    if file_data is None:
        raise

This way we can understand what is actually going wrong before we propose that the solution is file_data = recovery_attempt(). Once you understand the problem and have an effective solution, you can condense it back down to:

try:
    file_data = archive.read_file(data_file, force_decompress=True)
except (AError, BError, CError) as e:
    file_data = recovery_attempt()
    if file_data is None:
        raise

But we should avoid sweeping helpful information under the rug with except Exception as e:.

StoicLoofah commented 6 years ago

@cclauss definitely good practice if we can narrow in on the exception types, and would definitely encourage anyone to take a shot at it!

For this specific issue, though, i'll clarify that we aren't even hitting the exception: the function is actually running without an error. The problem is that it's returning None rather than any data. As to how it's coming to that, I have no idea =(

Gusgus01 commented 5 years ago

@StoicLoofah I dug into it a bit further :)

If you load those replays up in Starcraft 2, the players don't have names. I'm not sure if at some point they had names and even the Starcraft client can't parse them now or they never had names.

Also peeking into the MPQArchive using mpyq, I see that the usual replay.details file is missing which would explain why replay.details does not appear in self.raw_data. However, there is a replay.details.backup file in the archive, but it would appear to be missing the player name. Also, the current BitPackedDecoder available in decoders.py manages to mostly parse the data, but not fully.

Should the package support using replay.details.backup? If so I can dig into this further and try to figure out the parsing more, but it appears there won't be a player name either way and it looks to be missing other data as well. A better course of action if we want to support these replay files might be to make it fall back to using the pid instead or some default string for each player.

Necdilzor commented 5 years ago

but it appears there won't be a player name either way and it looks to be missing other data as well. A better course of action if we want to support these replay files might be to make it fall back to using the pid instead or some default string for each player.

These replays were anonymized by Blizzard, removing every player's specific data. When you load them using the sc2client-api you can't see their names either and it defaults to "1" and "2" as player names.

cclauss commented 5 years ago

Can we please consider something along the lines of #62?

StoicLoofah commented 5 years ago

@Gusgus01 thanks for looking more deeply into this! My short answer is that I would love to have a good solution for this but don't have a strong opinion on how to do it.

In the general case, it sounds like we won't stumble across replays with this problem in the wild: they just happen to be how Blizzard treated this particular data set, but since they are official, that makes them pretty relevant.

As I mentioned above, I really don't understand how the archive reading logic works: most of my tweaks to sc2reader start assuming that it was unpacked properly, then working to massage that into the python class structure. I would be happy to look at any PRs that manage to avoid crashes. From there, we can talk more about how we want to fill in details e.g. using pids or placeholder data.

lfricken commented 5 years ago

So I wanted to try solving this myself (I really want to use this library for those replays by blizzard). I had resources.py use replay.details.backup and replay.initData.backup and it successfully loaded some stuff, including players. But there were some problems.

replay.tracker.events doesn't exist, so I can't seem to find any UnitBornEvent's

cclauss commented 5 years ago

When replay.tracker.events doesn’t exist, would it be possible for you to create a dummy event with string values set to UnitBornWorkaround or BlizzardWorkaround and numeric values set to zero?

Gusgus01 commented 5 years ago

I've been toying with some ways to fall back to the backup files when the main ones aren't available, but ran into similar problems as @lfricken (as well as some more cosmetic issues like some print statements messing up).

I think he is saying the replays are a lot less useful without the TrackerEvents, since they are most of the more interesting things like UnitBorn, UnitDied, UpgradeFinished. Is there another source for these events that I didn't see?

There might be some things you could accomplish like tracking the usage of abilities like MorphDrone and such in replay.game.event, but that seems like a large undertaking.

Also, looks like someone we know (you may recognize that name from issue #63) has asked this question on the blizzard forums: https://us.battle.net/forums/en/bnet/topic/20769677762 So looks to be an issue with most (if not all) replays that blizzard serves up via their replay-api.

lfricken commented 5 years ago

As above, the script didn't crash, but unit data is what I want from these replays, so they are otherwise useless. replay.players(0).units is empty because there were no UnitBornEvents.

So I imagine blizzard tried to compress these replays as much as possible. (Since there are still 44gb worth, and hence their small size and lack of data). The fact that you can view these replays in sc2 means it must be possible to derive all information from these replays though.

Would it be possible to run these replays through sc2 or the api, then tell it to save the replay again, and get the expanded version?

Also noteworthy, someone on the Facebook page for StarCraft 2 ai linked me this and claimed it could get the unit data from these replays. When I viewed the pre computed database though, there was no unit data anywhere even though MSC has code for handling it, and a lot of documentation is about units.

https://github.com/wuhuikai/MSC

Gusgus01 commented 5 years ago

The removal of data is part of the anonymization process that Blizzard does to the replays for AI and ML use. Not sure the exact reason replay.tracker.events gets removed as part of that, possibly as you mentioned for size reasons.

I've seen MSC before. It uses s2clientprotocol (https://github.com/Blizzard/s2client-proto) and pysc2 (https://github.com/deepmind/pysc2). The former being the protocol to interact with Starcraft 2 and the latter does a lot of work to make it easier to interact with Starcraft 2 in a research oriented way. It would appear that MSC uses pysc2 to launch the Starcraft client and load up a replay from which it gets the data to analyze.

Just thinking about it briefly, I can think of a few options. If the analysis you want to do is simpler, some of it could be logic'd from the GameEvents (like did they start a specific upgrade in each of these replays, etc). You could also look at MSC and see how they retrieve the actions from replays using pysc2. If you really wanted to use SC2Reader, you could see if you can save replays with all that file once you have ran through the replay using pysc2.

Gusgus01 commented 5 years ago

Created a PR to fallback to the backup files when the main files aren't there. Let me know what you guys think. I did not attempt to do anything about the lack of replay.tracker.events.

lfricken commented 5 years ago

I had tried telling pysc2 to save the replay again, but it pretty explicitly doesn't want that. It said something along the lines of "can't save a replay of a replay". And either way, doing that for 100k replays isn't really feasible since it has to replay the whole thing in 8x time at best. If I could process a replay per minute (unlikely) that would still take 70 days for 100k replays 🙁

Gusgus01 commented 5 years ago

Looking at MSC, it looks like that is exactly what they did. They use multiprocessing and threads to do multiple at once.

StoicLoofah commented 5 years ago

Just merged in #69 . 2 questions for thread participants:

  1. do you guys want a new tagged release version for this?
  2. i think this technically fulfills the scope of this issue, though we still don't have the trackerevents. Would you prefer we keep this issue, or should I close and create a new one?
Gusgus01 commented 5 years ago

1) I'll let you decide on that one :)

2) I definitely think this fulfills this issue. I'm not sure how to go about dealing with the trackerevents bit since it looks like it will require using pysc2 (or sc2clientprotocol) and the sc2executable in order to replay the replay. I could see maybe adding a way to sideload data and then couple that with a tool that saves off the trackerevents from a replayed replay.

cclauss commented 5 years ago

My vote would be 1) a tagged release because this is a significant improvement and 2) close this issue and open a new one.

Great work here @Gusgus01 !!

Gusgus01 commented 5 years ago

Thanks :)

StoicLoofah commented 5 years ago

Created #74 as a followup. Thanks @Gusgus01 !