ggtracker / sc2reader

Extracts gameplay information from Starcraft II replay files
http://pypi.python.org/pypi/sc2reader
MIT License
146 stars 143 forks source link

Ability list for recent builds is incorrect #18

Closed frugs closed 7 years ago

frugs commented 7 years ago

The the current new_units.py and the shell commands listed in the HOWTO do not preserve the command index of the ability when populating ability_lookup.csv. This means that the abilities with the same ability link but different command indices end up mapping to the same ability id.

This can be easily verified by loading a replay (with game events) where the player builds a supply depot and a barracks. The supply depot command event will have the correct ability name, but the barracks command event will not.

ccopland94 commented 7 years ago

As of 6/7/17 this is still an issue. Is this an issue with the ability_lookup.csv being incorrect, the _abilities.csv being incorrect, or another issue altogether? If it is the _abilities.csv, how were these files generated? Wouldn't mind putting in the leg work if there's a way to generate these files other than guess and check.

dsjoerg commented 7 years ago

I have never really understood this part of sc2reader. @GraylinKim do you have any remaining brain dumps about this that you could share with us?

frugs commented 7 years ago

It has been a while since I last looked at this I recall the primary fault is the shell command detailed in HOWTO: find Balance\ Data -print0 | xargs -0 grep -h '<ability' | sort | uniq | tr '="' ' ' | awk -v OFS=',' '{print $5, $3}' | sort -n This generates a csv which only contains a mapping of ability links to ability ids, but ignores command indices. Ideally, it should be a mapping of ability link and a command index to an ability id, and this information should be preserved when transformed in subsequent steps.

As an example, here is an extract from SCV.xml in the balance data for protocol 48258:

        <unit id="CommandCenter" ability="128" index="0">
            ...
        </unit>
        <unit id="SupplyDepot" ability="128" index="1">
            ...
        </unit>
        <unit id="Refinery" ability="128" index="2">
            ...
        <unit id="Barracks" ability="128" index="3">
            ...
        </unit>

And the matching generated lines in 48258_abilities.csv:

128,Barracks
...
128,CommandCenter
...
128,Refinery
...
128,SupplyDepot

Notice that the command indices have been dropped, so it is no longer possible to perform a correct lookup.

ccopland94 commented 7 years ago

I see that. I am currently looking at the xml files that you get from following the HOWTO instructions. Each unit xml file has abilities within it. Further, each ability has a "command index" that tells what command within that ability is triggered.

Judging from the current build, it appears that it is only taking into account the ability ID, not the command ID (as you said). That said, it seems that the information is available and possible to retrieve. I am just not exactly sure that could fit into the current reader.

frugs commented 7 years ago

I believe that, eventually, there is some facility for combining an ability link and a command index into one value using some bit shifting (here is an extract from sc2reader/data/__init__.py, starting from line 372):

        int_id_base = int(int_id_base, 10) << 5

        abils = ABIL_LOOKUP[str_id]
        real_abils = [(i, abil) for i, abil in enumerate(abils) if abil.strip() != '']

...

        for index, ability_name in real_abils:
...

            build.add_ability(
                ability_id=int_id_base | index,
...

If we could propagate the command index information to this point, I expect that this will fix this issue.

ccopland94 commented 7 years ago

Hmm, I see that. That seems relatively convoluted. Maybe the format has changed, but even those "abilities" seem off that you listed above (given from the output of the shell command). It would appear to me that those indices are really pointing to an ability that all of those units share, such as BuildInProgress (which is index 127 in the most recent build for the buildings you listed above). Even fixing this error in labeling might help.

ccopland94 commented 7 years ago

After digging in the files output from the game, I think I see a reason for the confusion and potentially the error. In the xml files, there are two possible formats for "abilities", at least according to what is in the csv files.

The first is within the abilites tags that many units have:

       <ability id="BuildInProgress" index="127">
            <command id="Cancel" index="0">
                <meta name="71" icon="btn-command-cancel" hotkey="72" type="command" tooltip="73"/>
            </command>
        </ability>
        <ability id="TimeWarp" index="116">
            <command id="Execute" index="0">
                <meta name="631" icon="btn-ability-protoss-protoncharge-color" hotkey="632" type="command" tooltip="633"/>
                <cost cooldown="5"/>
            </command>
        </ability>
        <ability id="que5Passive" index="277">
            <command id="CancelLast" index="0">
                <meta name="202" icon="btn-command-cancel" hotkey="91" type="command" tooltip="203"/>
            </command>
        </ability>
        <ability id="RallyNexus" index="105">
            <command id="Rally1" index="0">
                <meta name="93" icon="btn-ability-terran-setrallypoint" hotkey="94" type="command" tooltip="95"/>
            </command>
        </ability>

Notice here how each ability has a different index, and within each ability there is a command. Each command has an index unique to its ability.

Then, we have the second, which is in the format you provided above. This applies to building/creating units, it seems. In this, each unit has an ability attribute that is not unique, but their indices are unique within that ability number. The index ability_lookup seems to be where it factors in the indices, but it only has one file (not one per build), so I am not sure how this can be correct. My guess is that the first column of each row is representative of the "id" field for each ability, while the subsequent columns represent the names that they get depending on their index.

frugs commented 7 years ago

Ah, I think I may have something further to add!

I believe there is only one ability_lookup.csv because of an underlying assumption that abilities won't have their string ids changed, and that the command indices of commands on particular abilities will remain constant. This means that if you join the information contained in [build]_abilities.csv and ability_lookup.csv on the string ability id, you should be able to work out the command index and ability index for every command.

An example would be the following:

[build]_abilities.csv

128,builds
38,move

ability_lookup.csv

builds,CommandCenter,SupplyDepot,Refinery,Barracks...
move,Move,Patrol,HoldPos

This gives us the information we need to call

build.add_ability(
                ability_id=int_id_base | index,
                name=ability_name, # probably best rename this to command_id
                is_build=bool(unit_name),
                build_unit=getattr(build, unit_name, None),
                build_time=build_time
            )

in __init__.py

As an example, for the Patrol command we would want to do the following:

build.add_ability(
    ability_id=(38 << 5) | 1,
    name='Patrol',
    is_build=false,
    build_unit=None,
    build_time=0)

The complications would be parsing the xml to allow us to interpret the <ability> tags as well as the <trains> and <builds> tags (I am not sure if there are other alternatives, but I don't believe so).

ccopland94 commented 7 years ago

So that makes sense to me. It seems that example only works for regular abilities. For the "train" abilities, which is really just training units, they have the same ability lookup depending on what builds them. So I am not quite sure how it decides which instance of 128 (using the example a couple comments up) it chooses.

frugs commented 7 years ago

Having a quick look at CommandCenter.xml, Barracks.xml and Starport.xml, it looks like the <unit> tags inside the <trains> tags have different ability ids. If we treat the <trains> tag as an ability with the ability name CommandCenter_trains, Barracks_trains and Starport_trains respectively, there should be no problem.

So, an example again:

[build]_abilities.csv

154,CommandCenter_trains
158,Barracks_trains
160,Starport_trains

ability_lookup.xml

CommandCenter_trains,SCV
Barracks_trains,Marine,Reaper,Ghost,Marauder
Starport_trains,Medivac,Banshee,Raven,Battlecruiser,VikingFighter,Liberator

This will allow us to do something along the lines of:

for ability_id, ability_name in build_abilities:
    commands = ability_lookup[ability_name]

    for command_index, command_name in enumerate(commands):
        build.add_ability(ability_id << 5 | command_index, command_name, ...)
frugs commented 7 years ago

There is some confusing nomenclature here with respect to ids, names and indices that could probably be cleared up here too.

ccopland94 commented 7 years ago

It seems that would certainly be more clear than what is happening. Unfortunately, for whatever reason, this is not always the case. For example, the Nexus can train a probe (ability 175) and a mothership core (ability 313). I put together a python snippet that should compile the list (and unit list) and is more straighforward than the shell command. It looks for abilities, trains, builds, and researches tags. I didn't see any more that have "ability" attributes. Other than the last three.

frugs commented 7 years ago

I just considered backwards compatibility; this could require a bit more thought.

ccopland94 commented 7 years ago

Yeah I wonder if the same format is kept, and the new csv files are added/updated, if it will fix the issue.

ccopland94 commented 7 years ago

My guess is that the index is "preserved" by the label's position on the ability_lookup csv, but I am not sure.

frugs commented 7 years ago

Yeah I wonder if the same format is kept, and the new csv files are added/updated, if it will fix the issue.

I believe so, but it will require ensuring that the ability name in the older [build]_abilities.csv files have their commands preserved in ability_lookup.csv in order to ensure backwards compatibility with older replays.

My guess is that the index is "preserved" by the label's position on the ability_lookup csv, but I am not sure. I believe that this was the original intention based on the following line in __init__.py:

real_abils = [(i, abil) for i, abil in enumerate(abils) if abil.strip() != '']

Here, the index of the column (ignoring the first column, as the first column is used as a key) that the command features in ability_lookup.py is being used as the command index.

frugs commented 7 years ago

Having had a quick look at WOL/18092_abilities.csv, I notice that the ability names for the train commands appear to be CommandCenterTrain, BarracksTrain, LarvaTrain etc. Following this format should ensure backwards compatibility.

It seems that would certainly be more clear than what is happening. Unfortunately, for whatever reason, this is not always the case. For example, the Nexus can train a probe (ability 175) and a mothership core (ability 313). I put together a python snippet that should compile the list (and unit list) and is more straighforward than the shell command. It looks for abilities, trains, builds, and researches tags. I didn't see any more that have "ability" attributes. Other than the last three.

Looking this time at HotS/23925_abilities.csv, it looks like there are three unique train abilities that do not fit the usual pattern: NexusTrainMothership, NexusTrainMothershipCore, and TrainQueen. We may need to write some custom handling for these ability names in the new csv generation script.

It also looks like the Hatchery, Lair and Hive research abilities are keyed under the LairResearch ability name, so this will probably also require some custom handling.

ccopland94 commented 7 years ago

Yeah I looked back at it, and some of the "train" ones aren't working, even after I created a new CSV file for the newest build. I think the "train" and "research" are good options. And it preserves backwards compatibility because these changes should only take effect if the build is new.

ccopland94 commented 7 years ago

I find it hard to believe that there will be no instances of name changes or index changes. There should really be a different ability_lookup for each new build or it should be an automated process. Like you said, the command indices are not automatically preserved, so right now you would have to go back and do it yourself for every possible command index, which makes it highly likely that will make a mistake.

frugs commented 7 years ago

There should really be a different ability_lookup for each new build

I completely agree with this, but I believe this should be an improvement made after resolving this issue.

the command indices are not automatically preserved

I'm not sure what you mean by this. Are you referring to the current csv generation pipeline? My suggestion is to fix the csv generation so that the command indices are preserved, in the form that the code in __init__.py appears to expect them to be preserved (namely, in ability_lookup.csv).

ccopland94 commented 7 years ago

Yes. What I mean is that if you just use "CommandCenter" instead of "TerranBuild", it has an index that is not preserved, you have to manually do it yourself in the ability_lookup file (I'm agreeing with you that the names should be changed back to what they were before). Further, there should be a programmatic way to do it so it is never done incorrectly, because I think that is what happened in recent builds.

frugs commented 7 years ago

OK, then I think we are largely agreed on how we think the resolution of this issue should be progressed. I can start working on a PR for a script which will will generate correct [build]_abilities.csv from the balance data xml (i.e. a replacement for the now-defunct shell command described in HOWTO).

The next step would be to add a script which will be able to add any new commands found in new balance data xml files with the correct command indices to ability_lookup.csv.

Finally, updating the HOWTO with instructions on how to use these new scripts should resolve this issue.

It would then probably be wise to consider further improvements to the system overall (e.g. tie command indicies to balance versions, tie the data contained in train_commands.json to balance versions, combine [build]_abilities.csv and ability_lookup.csv into one table etc.)

frugs commented 7 years ago

Given that I know no way of exporting older versions of the balance data, I will not aim for backwards compatibility for the csv generation scripts with them.

I will aim for the following situation:

frugs commented 7 years ago

@dsjoerg @ccopland94 First PR: https://github.com/ggtracker/sc2reader/pull/22

dsjoerg commented 7 years ago

Outstanding @frugs!! Now that you've resolved this, does this help you with your own sc2reader-based project, or did you dig into this issue due to some problem related to the ggtracker website?

dsjoerg commented 7 years ago

^^ Same question for @ccopland94

frugs commented 7 years ago

@dsjoerg I was running into issues with 6 months ago when I was working on my own sc2reader project intending to plot a player's "macro commands" in a one dimensional scatter plot against game time. The idea was to help help players compare where their macro starts faltering compared to a high-level player, and allow them to focus on concentrating on ensuring they perform "macro actions" more consistently.

Once this is fixed, hopefully I can reboot this project.

ccopland94 commented 7 years ago

Yeah my usage was with a non-ggtracker project.

I reviewed the changes and I think the new file will serve it well. I think this solves the issue short term, but it would be great if it could 1.) put the csv files in the proper directories and 2.) either update the currently existing unit and ability lookups or use versioning for those too.

dsjoerg commented 7 years ago

@frugs @ccopland94 excellent thanks!

frugs commented 7 years ago

@ccopland94 putting the csv files in the proper directories is a little tricky as I'd have to start assuming the directory from which the script is executed, which could be a fragile assumption.

I could perhaps make the sc2reader project root path a required input parameter to the script; would that be better?

ccopland94 commented 7 years ago

That is true. I had assumed it would just be run from where you had it placed in the project directory.

frugs commented 7 years ago

@ccopland94 I have incorporated your suggestion 1) into https://github.com/ggtracker/sc2reader/pull/24

frugs commented 7 years ago

This issue can now be closed.