Investigate feasibility of using an ECS (Entity Component System) for Structures

OutpostUniverse / OPHD

OutpostHD - Open source remake of Sierra On-Line's Outpost

BSD 3-Clause "New" or "Revised" License

109 stars 20 forks source link

Investigate feasibility of using an ECS (Entity Component System) for Structures #824

Open rmtoth opened 3 years ago

rmtoth commented 3 years ago

This suggestion was brought up in #714 as a way of getting more convenient access to StructureClass-specific functionality, rather than having to typecast to a specific Structure subclass.

DanRStevens commented 3 years ago

Re-posting links for ease of reference:

So my understanding of this idea so far, is structure abilities would have associated types, with one instance for each structure that has that ability. The structure would contain all the fields needed to implement that ability. The associated types would be stored in their own collections, and there would be a mapping between the ability instance and the structure that has it. In database terms, this would be a one-to-many relationship (1 structure, has many abilities, with each ability being owned by exactly 1 structure).

For example, we might have abilities for FoodStorage, WarehouseStorage, PowerProduction, FactoryProduction, etc. A Command Center would have abilities for FoodStorage and PowerProduction. An Agridome would have only FoodStorage. If you built 1 Command Center and 1 Agridome, there would be 2 entries in the FoodStorage table, and 1 entry in the PowerProduction table.

The Structure class itself would be very minimal, or perhaps even non-existent? If all meaningful properties were components, then a Structure might simply be the ID that ties all the component fields together.

This sort of design necessarily means a lot of references. Each ability component would have a reference to the structure that owns it. Do we need mapping in the reverse direction? How would a structure know what components it has, or does it even care? We could have a collection of component references, though I'm not sure that has much value, since components are of different types, so there'd be no real purpose in iterating over such a collection. You could potentially have fields in the Structure to point to each of the possible components, of which there is at most one instance, though that might waste space since most Structures will have only 1 or 2 components. Or you could just scan all the component tables for references back to the structure. That would be much slower, but not waste memory. But do we even care? If what we really want is to iterate over specific abilities, and do processing that way, then maybe we don't actually care about the structure itself, or mapping from a structure to abilities. Maybe that's only relevant if a structure is destroyed, and we need to cleanup associated components. That should happen rarely enough that speed isn't of much concern. I'm guessing that is the design approach of choice.

Since there will be many links between components to the structure they are associated with, we should consider how that will be serialized. Initially I was thinking separate objects would be serialized in separate tables with links between them, which requires also serializing the linking information. Though that might not be the best approach. Since each structure has at most 1 component of any given type, and each component is only ever used by one structure, it makes sense to store component data as sub-nodes in the XML for the structure node. That way the link between component and structure is implicit, and there is no need to serialize linking IDs.

I think I'm kind of liking this idea. Let me know if my understanding of anything is wrong, or if there is additional detail we should add here.

rmtoth commented 3 years ago

Your understanding is spot on as far as I can tell.

Regarding the key, it really just needs to be a unique (at any given time) value per structure that doesn't change during the lifetime of the structure, so that it can can be used for table lookups. That said, it is common to make it more useful by making the key itself contain information. For instance, using a pointer to a struct containing very common data as key allows for O(1) access to that data from components. Or you can give some bits or bitfields in the key special meaning, such as "has warehouse component", so you can query some basic information about the object without even dereferencing a pointer.

I personally like the notion of having very common data in a Structure class, and use Structure as key. You can still design it such that Structure is a component if you want to retain the API of a pure ECS, it just happens to be particularly fast to do a lookup in that table (which can be backed by vector<Structure> for enumeration purposes, rather than a map<Structure,Structure> if you want to save space). I'd vouch for being open with the fact that the key is a Structure* though, since it makes referencing common data less verbose.

Every component table stores key-value pairs, so it's technically not necessary to store the key in the component itself. That said, it's easier to write code if you have access to the key from the component itself. Otherwise you'd have to pass around key-value pairs rather than just component pointers in a lot of places, just to be able to obtain the key of the object you're operating on. Plus, your code will be littered with ".second" to get the component out of your iterators...

As you pointed out, there is seldom a need for enumerating all the components belonging to a key. Destruction may be the only one. That depends on how you want to design other parts of the code though. For instance, consider a discovery query like the user clicking on a building and the GUI needing to find out how to react. You could either put the relevant information in the common Structure table (and use the key to directly dereference it), or you could enumerate all components and interrogate them with virtual functions like "OnClick". If you want to do the latter, you want some way of finding the relevant set of components, rather than searching for the key in every single table. The way I see it, there are three reasonable options:

don't store any per-structure component list. Enumeration must search in every table.
store a per-structure bitfield of which components it has. Enumeration must search the tables in which the key exist only.
store a per-structure array of component pointers. Enumeration doesn't need to search at all. I don't have a good grasp of the various systems at play in OPHD so can't really make any recommendation here.

rmtoth commented 3 years ago

A note on how this relates to the current class structure:

The Structure class can stick around as the common data subset, but parts of it should probably be broken out to new components.

The Structure-derived leaf building classes would be replaced by factory methods for constructing and initializing the structure with all its components. There is no need for Structure subclasses. In practice, this is just cut-and-pasting the constructor content to a factory function.

The formerly structure-specific code would be moved into new StructureComponent-derived classes. In practice, this is mostly just changing the base class of the existing classes from Structure to StructureComponent. Structures that behave the same but have different parameters, such as "small and big residence", will only have one component class but different factory functions. Similar StructureComponents (FoodStorage, OreStorage or whatnot) can use regular C++ inheritance to ease implementation.

Code that today typecasts from Structure* to something else need to perform a lookup in the desired component type's table instead.

Code that today iterates over Structure* to find the subset that has certain abilities will instead directly iterate over the desired component type.

StructureClass will become entirely obsolete and will be removed.

The StructureID enum still seems useful. It is a compact way of representing common information like which sprite to use, which would otherwise need to be serialized per structure. Logic can also do minor special-casing based on it without having to introduce new components or distinguishing fields to components, but that may admittedly be a slippery slope to undermining the ECS system.

DanRStevens commented 3 years ago

Sounds like the idea is fairly well thought out. I'm starting to see how this would work. I would be interested in proceeding with this. Should probably check in with Leeor first though before going too far with it.

If we were to proceed with this, what would be the first step? I would very much like to avoid a monster commit that changes everything all at once, takes forever to review, and may have many feedback iterations. Is there a way to chunk changes into small change sets that move towards such a design without requiring a massive overhaul all in one step?

Code that today typecasts from Structure* to something else need to perform a Get operation for the desired component type instead.

Perhaps the typecasts could be replaced by a get template operation, which for now just does the typecast internally. That would allow for cleanup of some of the client code, without committing to the full change.

rmtoth commented 3 years ago

I think we can add the ECS management functionality to StructureManager as a first step, with a test-function that uses it in various patterns to get a good feel what the syntax will look like.

We can then convert one structure type at a time in separate commits. See for instance this commit, which updates the Warehouse class to the prototype ECS. Would that be an acceptable granularity?

I'd suggest working in a branch until that it is in a non-frankenstein state. I'm still not convinced that the effort won't end up in the bin after hitting some unforeseen roadblock. We'll probably run into a bunch of issues along the way, such as how to best deal with the existing behavior of filling the storage capacity of Command Centers before filling any storage tanks. And I don't know how much benefit it will bring either, to be honest, since structure functionality in OPHD is mostly a tree and 90% of it could be expressed in a regular object-oriented design.

On the flip side, writing systems is fun, so it will at least have entertainment value even if it never gets used in the end.

DanRStevens commented 3 years ago

Seems like a pretty positive attitude to approach the work.

In terms of commit granularity, I generally like things to be pretty granular. Perhaps a little more so than the commit you referenced, though that one was within reason. I noticed there were some renames mixed in with the code changes, which I usually like to do in separate commits. Renames can cause a lot of noise in the diff. Keeping diff noise to a minimum means the real changes stand out better. And for reference, I really don't mind commits (or even entire PRs) that consist of only a 1 character change to fix a typo in a comment.

The one restriction on granularity that I try to live by, is to not break up changes into separate commits so that the intermediate commits don't compile or pass unit tests. (So far OPHD doesn't have unit tests, only NAS2D does).

If there is some code cleanup you can do that would make the work easier, but doesn't require the work to be complete, I sometime make a separate PR for the prep-work. That might include renaming variables, fixing documentation, refactoring code to reduce code bulk, abstracting away parts of the API so future changes can be done while touching less code, and so on. Such a PR can reference the issue the prep-work was meant to help with, though it won't mark the issue with closes.

ldicker83 commented 3 years ago

Hey guys, So generally speaking I like the ideas presented here. I sort of skimmed what's posted because I'm a bit burned out at the moment on code and need to take a short break (like a day or two, nothing major). Going to take some time this weekend to do some more work on things.

I'm a little hesitant to use pointers as keys -- mostly because I'm not sure how to implement that well. I'm not opposed I just fear creating a system that is needlessly complex. From the commit that doesn't look like this is going to be the case... quite the opposite, it looks like it will solve a lot of the complexity that already exists, particularly the casting. I never cared much for typecasting and yet the entire entity 'system' in place makes a lot of assumptions about what pointers to objects actually are which is prone to breaking. There ends up being a lot of special-case code as well, mostly for serialize/deserialize functions. something I've been wanting to eventually tackle.

You can see from the code that I tried three different approaches, none of which are well implemented and ends up just sort of being a mish mash of shitty code. It works, technically, but it's a chore to update and maintain since serialize/deserialize methods and responsibilities end up all over the place.

I generally like the idea of individual components being responsible for serializing/deserializing themselves... though I think I'm starting to get off topic. That's something that I suspect this sort of system will help with correcting down the line.

rmtoth commented 3 years ago

I've got something that I'm fairly happy with in my fork now (unfortunately entangled with conversions of some of the Structure subclasses into components - I'll disentangle that in a new branch).

A logical structure is identified by a structure key, which is represented with a new SKey class.
A new StructureComponent class is introduced that serves as base class for all components.
StructureManager tracks Structure and StructureComponent, and their relation to SKey.
There are new helper functions to obtain a specific StructureComponent subclass pointer/reference belonging to a given SKey.
There are new helper functions to iterate over all instances of specific StructureComponent subclasses.
There are new helper functions with identical syntax to iterate over all instances of specific Structure subclasses to ease transition.

SKey just wraps a Structure*, to allow quick access to common functionality. I think it makes most sense to use that pointer as key, but it's encapsulated to hide that internal detail. It can be replaced by something else without affecting more than a few locations.

StructureComponent doesn't really contain much - its most important feature is that it has a virtual destructor. I opted for storing the SKey in StructureComponent for convenience. It also has a structure() convenience function to get a reference to the Structure associated with the structure. When creating a StructureComponent subclass, you must define a static constexpr ComponentTypeID componentTypeID = ...; in the subclass if it's supposed to be queriable/constructable. When creating a structure, first create the Structure instance along with any StructureComponent subclasses it should have, and pass them to StructureManager::create to allocate an SKey and associate the various instances with it. This should all be done in a factory function, rather than in the constructor of a Structure subclass.

Like before, StructureManager owns the lifetime of structures. It keeps track of all the StructureComponent subclasses in maps. This requires dynamic lookup to obtain the right subclass table, similar to the existing StructureClassTable implementation. While that's not strictly necessary, it makes for less boilerplate than building something with compile-time known table locations for the individual component classes.

You can construct an SKey from a Structure* at any time. You obtain a component of an SKey using either GetComponent<SomeComponentType>(someKey) or TryGetComponent<SomeComponentType>(someKey) - the first gives a reference back (or fails dramatically) while the second gives back a pointer (or nullptr if the key didn't have the component). You can iterate over all instances of a component with GetComponents<SomeComponentType>().

GetComponents returns a wrapper object rather than the underlying storage container, hiding the fact that storage is in a std::map. It made it cleaner to write range-based for loops, which now look like this: for (auto& componentRef : GetComponents<ComponentType>()) {...}

I added an emulation wrapper as well that allows you to use GetComponents with the existing Structure subclasses as well. It requires adding static constexpr StructureClass typeStructureClass = ... to the class you want to convert to the new syntax, but allows you to write code like: for (auto& powerRef : GetComponents<PowerStructure>) {...}

A few trivial things should be added, such as template<typename T> T& StructureComponent::Get() and an equivalent TryGet to obtain one component from the reference of another. And some of the names can probably be improved. But overall I'm fairly satisfied with the system.

Where do we go from here? Can you create an ecs branch so that I can clean up an issue a series of bite-sized pull requests? Or do you want to discuss and refine the system some more before accepting any PRs? If you want to inspect the code in my fork, you should be looking at these three files: StructureComponent.h, StructureManager.h, and addStructure/removeStructure in StructureManager.cpp.

DanRStevens commented 3 years ago

I'm not sure if I really followed what you wrote too well. Perhaps I'm a bit too tired to review effectively tonight.

From the first part I kind of got the sense you were still refining the idea.

From the later part I got the sense you're looking to start picking out parts of the changes into branches that are suitable for review and merging.

I'll have to look again tomorrow.

ldicker83 commented 3 years ago

I think I have a pretty good grasp of what's been done here. I'll create an ECS branch as requested to let you submit PR's and commits to so we can make granular changes and refinements before pushing to master.

Can see the new branch here: https://github.com/OutpostUniverse/OPHD/tree/entityComponentSystem

Also want to apologize for being somewhat absent the last week or so -- was starting to feel a bit of burnout and needed a break. Getting back into the swing of it though!

DanRStevens commented 3 years ago

You know what, maybe we should invite @rmtoth to the Outpost Universe organization. Can work directly out of the main repository then, including creating branches (and working on them). It also means we can collaborate on PRs without have to continually pull from either other's forks, and create PRs to update PRs. It'll also probably help with the Continuous Integration (CI) builds, which don't tend to run for PRs from forked repositories.

Edit: Invitation sent.

rmtoth commented 3 years ago

Thanks. Still orienting myself in the github workflow... is the idea that I create a WIP branch (or several) and directly push my proposed changes there? And then to create a PR from that WIP branch into the entityComponentSystem branch that Leeor created, so that we can review the changes in managable chunks?

ldicker83 commented 3 years ago

That's the general idea. We usually break features down into smaller change sets but since this is effectively a rewrite of the structure system for the game I'd break it off into its own branch and work out of that one, merge in any changes that will inevitably get added to master, test and then merge. It's kind of a big set of changes so I think breaking it apart like this makes sense.

Also, agreed with the organization invite.

DanRStevens commented 3 years ago

Now that you're a part of the Outpost Universe organization you can clone the original repository directly and create your own branches. The branches are still submitted for review using PRs. Just that the PR branch is in the same repository as the merge target (generally master).

You can start from the branch Leeor created if you want, though it's already a few commits behind master. Feel free to start a new branch if you want to be more up-to-date, or to rebase the old branch to bring it up to date. Updating the old branch may require a force push. You are allowed to force push to a branch, particularly if you're the only person working on it. Please never use a force push against master though, and be cautious about force pushing to a branch that someone else has based work off of.

When you push changes to a branch, the CI builds will pick up the changes and compile them for all platforms. You can see the build status on the PR preview page (before opening the PR). You can also see the build status after the PR is opened, but be aware any pushes to an open PR will send notifications to everyone working on the project. If you're testing stuff out and don't want to bother other people with notifications, it may be better to do it on a branch without an open PR, and use the PR preview page to check on the build status. I generally only open the PR when I'm done working and it's ready for review. If I need to test something after opening a PR, sometimes I'll branch off of the branch, and test using the new branch with no associated PR. When things work, I can then update the original branch to include the new commits using a git reset followed by a git push.

As an example, here's the preview for a branch with no open PR: https://github.com/OutpostUniverse/OPHD/compare/master...updateCppStd

The preview gives a list of commits, followed by a diff of all the changes. You'll notice a red X next to the commit hash on the right side of the commit list. That indicates a failed build status. A yellow circle means it's building, and a green circle means the build has finished successfully. If you click on the icon you can see the status for each individual build, and follow the "Details" link to see the build logs or any error messages.

It probably won't be super obvious how to chunk up the work until you've got somewhat working results. It can help to learn how git rebase works so you can repackage experimental work into polished commits that follow a logical sequence. We might need to give some guidance on this. That would likely come after you've got something kind of working, and then we can discuss how to pick apart pieces of it that can be reviewed and merged in smaller chunks. I generally merge pieces directly to master. It seems the thinking on this one is the change set will necessarily be a large all or nothing change, hence a feature branch with small parts merged to the feature branch. I'm not sure that will actually be necessary though. I suspect we can merge small changes sets directly to master, without ever breaking master.

rmtoth commented 3 years ago

Now I see why you wanted the work to be done inside the Outpost Universe repo. I see that the change actually failed to build on Linux - I'll try to find the time to fix that during the weekend.

ldicker83 commented 3 years ago

Yup. Continuous Integration really helps to identify potential build errors or mistakes made in the code and ensure that we're always pushing code to master that builds on all supported platforms.

rmtoth commented 3 years ago

I'm a little hesitant to use pointers as keys -- mostly because I'm not sure how to implement that well. I'm not opposed I just fear creating a system that is needlessly complex.

@ldicker83 I tried encapsulating the key in the SKey class. But I'd rather just ditch it and use Structure* as key directly to reduce the friction in hooking everything up, since most code already deals with Structure pointers. But I'd like to hear your thoughts about it - is there any particular complexity you foresee? @DanRStevens already suggested the key encapsulation is of little value.

ldicker83 commented 3 years ago

I think I was concerned about writing pointers out to disk. I may have misunderstood something.

If you think it's a cleaner solution to use the pointers, by all means go ahead.