Ability to refer to entities by identifiers stored in an external data structure

Hey Piotr!

It's been about a year since I started using becsy and it's still working wonderfully for my use case; thanks again for your sustained effort on this awesome library 🙏 I hope for my app to go public soon enough so we can share it with you!

I've come with yet another odd use case, but one I'm hoping you'd have suggestions on 😄

I've been trying to build a VisibilitySystem which marks specific elements which are in or out of view as visible using a VisibleComponent. To determine visibility performantly, my system is making use of an external data structure to implement a map of tiles (leveraging tiling as the algorithm), and we're also considering a quadtree for spatial queries later onwards.

My question is: I'm struggling to find a fast way to lookup entities after I query for which ones are visible in my data structure. The problem is exacerbated by our world having a huge number of entities (sometimes upwards of 10000).

The system I'm trying to build is roughly like:

class VisibilitySystem extends System {
  entities = this.query((q) => q.addedOrChangedOrRemoved.with(IdComponent, PositionComponent, SizeComponent));
  camera = this.query((q) => q.addedOrChanged.with(CameraComponent).trackWrites);

  initialize() {
    this.tileMap = new TileMap();
  }

  execute() {
    for (const entity of this.entities.addedOrChangedOrRemoved) {
      const id = entity.read(IdComponent).id;
      const visible = this.tileMap.addElement(
        id, 
        entity.read(PositionComponent), 
        entity.read(SizeComponent)
      );

      // Works great for mutating a single entity when it changes
      if (visible) {
        entity.add(VisibleComponent);
      }
    }

    if (this.camera.addedOrChanged.length) {
      const camera = this.entities.addedOrChanged[0];
      const visibleElementIds = this.tileMap.visibleElements(camera);

      // Question: I want to avoid traversing through all ~10000 elements I have here to save on performance cost
      // and add a `VisibleComponent` to those things which have become visible (and do a lookup to see which
      // elements are no longer visible to remove the component, etc).
    }
  }
}

In such a situation I would typically err for a lookup table of sorts to easily lookup an entity by the id you see above, but I can appreciate that building threading primitives for that may be tricky.

One idea I had whilst thinking through my use case is adding an indexed query flavour to queries. It felt like it'd fit well into the way we're currently interpreting the API in our project, whilst also aligning with becsy's thread-safe vision. One way I could see that playing out is as below - supporting hashed lookup of entities based on user-defined components:

class MySystem {
  entities = this.query(
    (q) => 
      q
        .addedOrChangedOrRemoved
        .with(IdComponent, PositionComponent, SizeComponent))
        .indexed('entitiesById', e => e.read(IdComponent).id)
  );

  execute() {
    // later on, we are able to do a quick lookup & add the `VisibleComponent` to entities
    for (const entities of entitiesWhichAreNowVisible) {
      // O(1) lookup vs O(n) for thousands of elements
      const entity = this.entities.fromIndex('entitiesById', entity);
      entity.add(VisibleComponent);
    }
  }
}

In this way, we'd keep the API surface change pretty small, whilst also fitting into the pre-existing concept of query flavours.

Another pattern distinct to above may be to allow entities in a system to mark themselves as being indexed:

const entity = this.someQuery.current[0];
const entityId = entity.read(IdComponent).id;

entity.index(entityId);

// in another tick of the game loop
const entityId = someListOfIds[0];
const entity = this.queryIndex(entityId);

In either case: Just wanted to share those ideas as I've been thinking about this a little bit in the context of my project 😄 Keen for any and all suggestions from your side independent of these of course, including if we're not wielding the existing API surface of becsy well enough / missing something which would suit the above use case.

In short: We need a way to refer to performantly refer to entities by identifiers stored in a specialised data structure.

Thanks in advance for all your help!

So before I dive deep into the idea of indexes and such, I have a simple question: could the tile map store entities directly, rather than just IDs? That is, you'd do something like this:

const visible = this.tileMap.addElement(
        entity.hold(), 
        entity.read(PositionComponent), 
        entity.read(SizeComponent)
      );

And then:

const visibleElements = this.tileMap.visibleElements(camera);  // returns array of entities

I think the only caveat is that you must ensure you never access deleted entities but that should be pretty easy to handle with a removed query that wipes them from the tile map. (Or perhaps these entities live forever once created in your app?)

What do you think?

Thanks for the quick response!

That's an interesting idea; thanks for sharing 🙏 I'd definitely like to consider .hold() as an option, I think it would help with solving the problem I have. I do think there's a few things which are constraining me in my current project, though:

Architecturally we're trying to keep our data structures independent of our ECS. We've been applying this principle to other parts of our stack too (e.g. our rendering layer). In this specific case, we're eager to have the one virtualisation contract which can be extended and evolved over time with different strategies (e.g. the quadtree I mentioned above).
Over time, we want to reflect the virtualisation algorithm we choose and its domain model across our stack. I.e., we hope to move some parts of our tiling implementation to the server-side parts of our app, and to share stable IDs between client and server, with the data structures speaking to one another up-and-down the stack.

More generally to the last point: Since we have server-side infrastructure and ingest events from it, we have a few other use cases where indexed access (or some form of it) would be amazing 😄 Currently, due to this event-driven infra, we need to look through all entities in the world to synchronise a local ECS with what has happened in other parts of the stack / on other machines.

E.g., if an entity is deleted by person A or via a HTTP API call, we supply an ID to person B, and then person B has to sweep through all the entities in their ECS to finally delete the correct entity (based on an IdComponent lookup).

One last question I wanted to ask you regarding .hold(): Does holding every entity have any 'bad' consequences, in your view? My first instinct was to worry a bit about performance and memory, as I know becsy does a bit of heavy lifting on the storage compaction side of things; would holding every entity have an adverse impact on these two?

Thanks again for helping me out with this - really appreciate it!

I think the answer to both your concerns is the same: if you want or need to design your data structures in terms of raw IDs, then you just shift the translation to a separate mapping layer where you still use .hold() to target actual entities. This could be as simple as the following, though you might need to tweak it if you require new entities to be mapped as soon as they're added rather than just once per frame.

class IdMapper extends System {
  entities = this.query(q => q.added.removed.with(IdComponent));
  map = new Map();

  execute() {
    for (const entity of this.entities.added) this.map.set(entity.read(IdComponent).id, entity.hold());
    for (const entity of this.entities.removed) this.map.delete(entity.read(IdComponent).id);
  }

  lookup(id) {
    return this.map.get(id);
  }
}

class SomeOtherSystem extends System {
  mapper = this.attach(IdMapper);
  execute() {
    // ...
    entity = this.mapper.lookup(id);
  }
}

There are no material downsides to using .hold(). As mentioned previously, you do need to ensure that you don't keep using the entity handle after the entity has been deleted, but that's fairly straightforward to deal with (especially as you always have a full frame to update your data structures). There's some overhead in dev mode for Becsy to check this invariant, but it goes away in perf mode, where .hold() just creates a new lightweight object that captures the right internal ID. I guess if you create/drop held entities often you might have a bit of GC churn to contend with at worst...

I think the only case where built-in indexing would matter is in multi-threading situations, since attaching all systems to a single, thread-specific map would tank concurrency. I haven't completely figured out how to address this yet but there are two basic approaches:

Replicate the map structure per-thread and use Becsy's internal "log" system to ensure that changes are processed in all threads. This is how refs fields work. It's safe, fairly easy, and has great read performance, but write performance goes down with the number of threads since each one needs to redo all update work.
Design dedicated shared-memory index structures that create ephemeral entity handles on access. This would be more tricky to manage and get right, and read performance would likely be a bit worse due to the entity handle allocation / borrowing overhead, but it may be worth it in some situations.

I intend to dig more into these options once I get multithreading running, since it'll be hard to experiment with stuff before then and I don't want to commit prematurely to a bad design!

Gotcha!

I think the approach you've described with the holder map would indeed meet the use case of our existing project 👍 Super keen to see where you land with the indexing stuff when you get to it!

As a completely random idea / out of interest: I wonder if there is some overlap with the problems you're trying to solve with multi-threaded storage and making use of a light, thread-safe DB layer such as absurd-sql or sql-lite in the browser 😄 Not sure if you've already looked into it. I'd be surprised if they give you the perf & concurrency guarantees you're after, though.

On the topic of this specific issue though - thanks, this approach should unblock us for time being!

Ah, I hadn't considered using IndexedDB or one of the adapters built on top of it! It does give strong concurrency guarantees, but I think you're right that performance is likely to be poor, at least in comparison to direct shared array writes. Another constraint is that Becsy needs to run both in browser and in Node, and I didn't find any good IndexedDB polyfills for the latter -- not much demand, I guess, and they especially don't address concurrency issues. Probably best to stick to shared arrays...

Another constraint is that Becsy needs to run both in browser and in Node

Makes sense!

If you consider looking into it, some form of sqlite wrapper may suit that need, as it can run in both a browser-based environment and Node 😄 I can't confess to having used it a huge amount, but have found it gives reasonable performance with careful schema design.

LastOliveGames / becsy

Ability to refer to entities by identifiers stored in an external data structure #21