yaml/received/filtering/logic: More item groups, consolidated

MatthewMarinets commented 5 months ago

What feature would you like to see?

After #159, item groups are now used in two places: the yaml options, and /received filters. There should be more groups, they should be intuitive, and they should be documented. Logic and filtering should also tap into item groups for some kinds, such as AA, competent comp / core unit, heavy artillery, etc., rather than redefining these lists independently.

Additionally, it may be good to move ItemData.type and itemData.origin to use enums or flagEnums instead of strings or sets of strings, and incorporate those things into groups. Once origin is an item group the associated exclude-by-origin / exclude-by-type yaml options can be removed as they can be accomplished by adding the groups to the normal item exclude lists (and we have enough option bloat already).

[x] Have a big brainstorming session on what groups people actually want
[x] Add those groups to ItemGroups.py
[x] Document the groups -- see https://matthewmarinets.github.io/ap_sc2_icons/itemgroups
[ ] ItemData.origin uses a FlagEnum
- [x] Remove nco_items, ext_items, bw_items yaml options
[x] ItemData.type / type_flaggroups replaced with an enum
[x] Port item filtering to reference item groups
[ ] Port logic to reference item groups

Salzkorn commented 4 months ago

Overall I'm a fan, but I have a few comments:

Add those groups to ItemGroups.py

As an alternative, we could also have a StrEnum of group names and tag items with their respective groups. I'm not sure which way is better, it feels better to me to tag items and have the groups be auto-magic, but if anyone goes digging them for, ItemGroups.py would probably be the place they look first. I don't know if one would be more prone to errors over the other.

ItemData.origin uses a FlagEnum Remove nco_items, ext_items, bw_items yaml options

I'm overall in favor of this, but want to stress that origin should remain a set, and also that I'm not sure this is a simple change. They don't seem to be used in a lot of places, but they're carrying the functionality of having items that are only conditionally progressive. (get_item_quantity in PoolFilter.py & allowed_quantity in __init__.py) Their other use (that I saw) is that items check that their origins intersect the world's available item sets to determine if the item should appear in the seed. (allowed_quantity) For both of these cases, moving from simple option check/small list intersection logic to vastly expanding the excluded item list and relying on that seems like it could be a notable performance regression, particularly when excluded items are used & checked in a lot of other places.

Port item filtering to reference item groups

What is item filtering referring to here?

Port logic to reference item groups

I'm guessing the idea is to have groups representing logic functions, but this seems really complicated, just looking at Terran anti-air as an example:

Basic anti-air
- Also counts competent anti-air
- Has special logic for advanced tactics
Competent anti-air
- This counts ground-to-air and air anti-air and has no units of its own
Air anti-air
- Has special logic for advanced tactics

What would groups for these look like? Is the plan to go all-out and make basically every combination of used options a group? (This sounds scarier than I mean it, in practice it would mostly just be "X group" & "advanced X group") How does the difference between has_any and has_all get handled? As an example, I don't know how I feel about tagging Swarm Hosts as any kind of anti-air just because Pressurized Glands lets them shoot up. Do we care that users would have to effectively learn the differences between all the races' anti-air logic?

On paper I'm in favor of this, but I can't really imagine what the final result would look like. The plan I had in mind was to tag things with their use case ("if it shoots up, it's Anti-Air") and let users figure out the rest themselves, which I don't believe is better, but would certainly be simpler.

MatthewMarinets commented 4 months ago

As an alternative, we could also have a StrEnum of group names [...]

I really want to move the item groups away from strings as much as possible. In fact, we specify them as strings only to immediately map them to non-overlapping ints in item_flaggroups` at the base of the file. These indices then get used by the client to determine what number the item's bitfield value lives in. My question is, why have strings involved in this process at all? Just go from enum name -> int, and only translate to string in ItemGroups.py to expose things to yaml users (if we even want to expose that).

want to stress that origin should remain a set

A FlagEnum should be even more performant than a set; under the hood it's an integer bitfield, meaning functionally it's a set of enum options. Every possible set can be expressed in an integer with number of bits equal to the number of members of the enum. Checking intersection is as quick as a bitwise and. Plus it gains all the type safety and advantages of enums -- no risk of someone typoing an option somewhere and getting a silent bug.

So yes, functionally like a set, but in actually a flag enum (code name enum.IntFlag) is what we want.

What is item filtering referring to here?

PoolFilter.py; as you pointed out with the item origins, it has to deal with some certain item groups. Item parentage / relatedness is also more complicated than just checking the parent item nowadays, as some upgrades can impact multiple units (e.g. zealot upgrades, immortal upgrades), some upgrades affect units that can be obtained multiple ways (spider mines, darchons), etc.I

If we need some extra logic to determine item relatedness / parentage beyond checking ItemData.parent, then I think that should be captured in ItemGroups.py as much as possible / reasonable.

How does the difference between has_any and has_all get handled?

To be honest, I hadn't thought too hard about that. I think that may have to be decided in the big brainstorming session, with some yaml creators / users present. In my mind it should be sufficient to capture any or-groups, and if that means having separate "X group" and "X group advanced" groups, I don't see that as too big a cost. I expect ItemGroups.py to become rather large with many hand-crafted groups, which is why I pushed for it to be a separate file. If it's somewhat user-facing as yaml creators check it to see what groups they can use, it's easier for them to spot and report bugs if things are enumerated.

For and relationships, though, I'm not sure. I think it's good to move in the direction of pushing stuff into ItemGroups.py, but limit it to stuff within reason. If it complicates things the using code, it's not worth being a hardliner on this.

Overall, really good comments and things to think about, though.

Salzkorn commented 4 months ago

I really want to move the item groups away from strings as much as possible. In fact, we specify them as strings only to immediately map them to non-overlapping ints in item_flaggroups` at the base of the file. These indices then get used by the client to determine what number the item's bitfield value lives in. My question is, why have strings involved in this process at all? Just go from enum name -> int, and only translate to string in ItemGroups.py to expose things to yaml users (if we even want to expose that).

Sorry, the part I quoted maybe didn't make it clear what I was referring to: I meant item name group names, not flag group names. I'm fully on board with flag groups becoming something better than strings, but item name groups must be strings as a requirement on Archipelago's end, and so a StrEnum would be our best bet for those I think, similar to how ItemNames.py works, but restricted to only "correct" values.

Anyway, this all assumes we would even want sets of those on items at all instead of listing them by hand in ItemGroups.py, where the name would only occur the one time. Then again, if the plan is to use those groups for logic, I suppose we need a way of "forcing" them correct either way.

If we need some extra logic to determine item relatedness / parentage beyond checking ItemData.parent, then I think that should be captured in ItemGroups.py as much as possible / reasonable.

Big agree on this!

MatthewMarinets commented 4 months ago

but item name groups must be strings as a requirement on Archipelago's end, and so a StrEnum would be our best bet for those I think, similar to how ItemNames.py works, but restricted to only "correct" values.

I'm not convinced we want to expose the raw flaggroups. afaik sc2-wise we only really use them for determining bitfield indices when communicating with the game. To turn it into a yaml or search-facing group, we'd want to process it a little (combine the armory categories, combine the armory categories, sort progressive stuff into other categories, etc). We don't need the input groups to be strings for that, only the output group names. So if we have to do processing anyways to reach AP item groups, I'd say keep the internal-use stuff as enums and only translate to strings when we have to and when we have to process anyways.

Salzkorn commented 4 months ago

I'm not convinced we want to expose the raw flaggroups. afaik sc2-wise we only really use them for determining bitfield indices when communicating with the game. To turn it into a yaml or search-facing group, we'd want to process it a little (combine the armory categories, combine the armory categories, sort progressive stuff into other categories, etc). We don't need the input groups to be strings for that, only the output group names. So if we have to do processing anyways to reach AP item groups, I'd say keep the internal-use stuff as enums and only translate to strings when we have to and when we have to process anyways.

Oh, yeah, I should have made clear, I'm in agreement with turning flag groups into FlagEnums, regardless of how the groups turn out. My only concern with not exposing them is that we would need duplicate data somewhere, eg. units would need both the unit flag and be in a unit group somewhere.

MatthewMarinets commented 4 months ago

Fair. My expectation is that unit groups will be based on those flags by e.g. filtering the item list:

item_name_groups["units"] = [name for name, item in get_full_item_table().items() if item.type in (ProtossItemTypes.UNIT, ProtossItemTypes.UNIT_2, ZergItemTypes.UNIT, TerranItemTypes.UNIT)]

Directly concatenating these categories won't always work, though. Consider how most progressive upgrades will have to be manually split off into other groups (armory for stimpack, SOA for Proxy Pylon, buildings for PF upgrades, etc)

Salzkorn commented 4 months ago

Consider how most progressive upgrades will have to be manually split off into other groups (armory for stimpack, SOA for Proxy Pylon, buildings for PF upgrades, etc)

For what it's worth, it's simple enough to append those special instances to the relevant groups, but having less exceptions is also always a good thing.

Salzkorn commented 4 months ago

It came up in the beta async's chat and I didn't know about it, so I'll echo it here: It's possible to !hint for item groups, which I imagine will be a useful way to think about what item groups someone might want.

MatthewMarinets commented 2 months ago

As part of PR #192, I'm adding several new item groups:

class ItemGroupNames:
    TERRAN_UNITS = "Terran Units"
    ZERG_UNITS = "Zerg Units"
    PROTOSS_UNITS = "Protoss Units"

    BARRACKS_UNITS = "Barracks Units"
    FACTORY_UNITS = "Factory Units"
    STARPORT_UNITS = "Starport Units"
    NOVA_EQUIPMENT = "Nova Equipment"
    TERRAN_BUILDINGS = "Terran Buildings"
    TERRAN_MERCENARIES = "Terran Mercenaries"

    GATEWAY_UNITS = "Gateway Units"
    ROBO_UNITS = "Robo Units"
    STARGATE_UNITS = "Stargate Units"
    PROTOSS_BUILDINGS = "Protoss Buildings"
    AIUR = "Aiur"
    NERAZIM = "Nerazim"
    TAL_DARIM = "Tal'Darim"
    PURIFIER = "Purifier"

In that PR, I also updated the ItemData.type parameter to be an enum with a display string, which should automatically merge flag group-based item groups based on having the same name. I'm not sure we want those groups to be public facing, though (or even have those as proper Archipelago groups at all), in favour of semi-handrolled groups to account for those troublesome progressive items that go into a different flaggroup.

I've found a nice pattern to be something like:

item_name_groups[ItemGroupNames.NOVA_EQUIPMENT] = nova_equipment = [
    *[item_name for item_name, item_data in Items.item_table.items()
        if item_data.type == Items.TerranItemType.Nova_Gear],
    ItemNames.NOVA_PROGRESSIVE_STEALTH_SUIT_MODULE,
]

which allows us to specify an item group using flags, but possibly adding extra items manually. There's two = signs, ensuring that the "exported" group (ItemGroupNames.NOVA_EQUIPMENT = "Nova Equipment") matches an internal variable (nova_equipment) exactly. The internal name can be used for item filtering, for sanity-check asserts, and unit tests.

Ziktofel / Archipelago

yaml/received/filtering/logic: More item groups, consolidated #160

What feature would you like to see?