Closed jeanlemotan closed 1 year ago
Note that you're creating 500.000 entities, not 100.000 ;)
The RAM overhead is expected, as parenting splits up entities across tables. Names are stored in a hashmap to enable fast lookups, which also takes up space.
The query performance should be somewhat slower, but not as much as you're seeing. Try adding this to the query builder:
.instanced()
You can read more about that here: https://www.flecs.dev/flecs/md_docs_Queries.html#autotoc_md225
The tl;dr is that without instancing, if you're using iter
, each entity will be iterated by itself because the query contains shared fields (from the parent). With instancing enabled entities are iterated in bulk.
I should probably update the example with this 🤔 I could use each
here, which is instanced by default.
My bad, it's 500K entities indeed. I tried with instanced now and it does make a difference when there is no parenting. For the parenting case, almost no diff. Here are the new timings and mem reports, adjusted for the real 500K entities:
No names, no parents Memory Total: ~100MB Memory per entity: 0.2K Queries per second: ~3300 (up from 1500)
Names, no parents Memory: 280MB Memory per entity: 0.56K Queries per second: ~3300 (up from 1500)
No names, parents Memory: 670MB Memory per entity: 1.34K Queries per second: ~46 (up from 42)
Names & Parents Memory: 2700MB Memory per entity: 5.4K Queries per second: ~44 (up from 42)
And the code:
#include <chrono>
#include <hierarchy.h>
#include <iostream>
struct Position {
double x, y;
};
// Tags for local/world position
struct Local { };
struct World { };
int main(int, char* []) {
flecs::world ecs;
constexpr bool use_names = true;
constexpr bool use_parents = true;
// Create a hierarchy. For an explanation see the entities/hierarchy example
char name[32];
for (int i = 0; i < 100000; i++)
{
sprintf(name, "sun_%d", i);
auto sun = ecs.entity(use_names ? name : nullptr)
.add<Position, World>()
.set<Position, Local>({ 1, 1 });
sprintf(name, "mercury_%d", i);
auto mercury = ecs.entity(use_names ? name : nullptr)
.add<Position, World>()
.set<Position, Local>({ 1, 1 });
if (use_parents) {
mercury.child_of(sun);
}
sprintf(name, "venus_%d", i);
auto venus = ecs.entity(use_names ? name : nullptr)
.add<Position, World>()
.set<Position, Local>({ 2, 2 });
if (use_parents) {
venus.child_of(sun);
}
sprintf(name, "earth_%d", i);
auto earth = ecs.entity(use_names ? name : nullptr)
.add<Position, World>()
.set<Position, Local>({ 3, 3 });
if (use_parents) {
earth.child_of(sun);
}
sprintf(name, "moon_%d", i);
auto moon = ecs.entity(use_names ? name : nullptr)
.add<Position, World>()
.set<Position, Local>({ 0.1, 0.1 });
if (use_parents) {
moon.child_of(earth);
}
}
// Create a hierarchical query to compute the global position from the
// local position and the parent position.
auto q = ecs.query_builder<const Position, const Position, Position>()
// Modify terms from template to make sure the query selects the
// local, world and parent position components.
.instanced()
.term_at(1).second<Local>().self()
.term_at(2).second<World>()
.term_at(3).second<World>().self()
// Extend the 2nd query argument to select it from the parent
.term_at(2)
// Get from the parent, in breadth-first order (cascade)
.parent().cascade()
// Make term component optional so we also match the root (sun)
.optional()
// Finalize the query
.build();
const auto start = std::chrono::system_clock::now();
constexpr int iterationCount = 100;
for (size_t i = 0; i < iterationCount; i++)
{
// Do the transform
q.iter([](flecs::iter& it, const Position* p, const Position* p_parent, Position* p_out)
{
if (p_parent) {
if (it.is_self(2)) {
for (auto i : it) {
p_out[i].x = p[i].x + p_parent[i].x;
p_out[i].y = p[i].y + p_parent[i].y;
}
}
else {
const double px = p_parent->x;
const double py = p_parent->y;
for (auto i : it) {
p_out[i].x = p[i].x + px;
p_out[i].y = p[i].y + py;
}
}
}
else {
for (auto i : it) {
p_out[i].x = p[i].x;
p_out[i].y = p[i].y;
}
}
});
}
const auto duration = (std::chrono::system_clock::now() - start) / iterationCount;
std::cout << 1.0 / std::chrono::duration_cast<std::chrono::duration<float>>(duration).count() << std::endl;
}
I ran the test locally and confirmed the numbers. A few observations:
I did measure a few things that I cannot explain (removing cascade
improves performance, but it shouldn't make a difference). There may be some things I can do to improve.
Another thing you can do to significantly improve performance of systems that match with large numbers of hierarchies/other relationships is to use query groups. Query groups (see the queries/group_by_*
examples) let you group tables together, and select the tables in advance to query.
This would for example let you group tables by world cell, and only iterate the ones that are close to the camera. This is something I do in the Flecs hub renderer module: https://github.com/flecs-hub/flecs-systems-sokol/blob/master/src/modules/geometry/geometry.c#L891
There are a few storage improvements on the roadmap that could improve RAM performance, stay tuned 😊 Those improvements can also help with query performance, especially for queries that aren't interested in hierarchies and just want to iterate a plain list of components.
Thanks a lot for the investigation. The problem is that in my use-case (gaming) it's more common to have deep hierarchies rather than wide ones, mostly due to skeletal nodes. So it's very common to have lots of nodes with only one child. In such cases, even the classis OOP Node* approach yields better results, performance and memory wise.
I'd really like to keep using flecs as I love the API and the features it provides, and now also the support :) So ATM I'm rethinking to structure the data with a custom relationship component so that I can avoid the child_of call and fragment the tables. Is there any way to iterate efficiently or sort to that I can process the parents before the children? Any other suggestions?
To give you an idea of the kind of data we're dealing with, there are usually 15-30K nodes in a scene, and each of these has 2-3-10 child nodes (in a deep hierarchy) with some (characters) having up to 130-150 nodes in a biped skeleton (so also a deep hierarchy). A level has a total of 100-300K entities, sometimes more. The level can be partitioned/grouped in the game, but the tooling has to be able to efficiently query and iterate all the nodes for various operations in realtime (so in under single digit milliseconds). Then there's the memory problem, which easily goes in the hundreds of megabytes of overhead.
We're thinking to replace a classic Node* approach with flecs btw, and apart for this issue, everything else is faster, easier, better.
Thanks again for the effort!
Ahh right that makes sense! For skeletal nodes (especially if you have lots of them) the overhead of relationships is probably not worth it, I think I would use a more dedicated solution for that (like a component with a vector of nodes).
I think I'll be able to bring down the RAM footprint, possibly significantly, but almost certainly not below a simple OOP-style hierarchy. That is because of a few things:
Relationships work well when:
Your use case is kind of the opposite of that 😅 You're probably not interested in querying a specific part of a rig, skeletal hierarchies are static, and like you mentioned they're deep and not very wide.
This is an interesting use case, I've been thinking for a while on how to support static hierarchies like the one you describe, which wouldn't come with the overhead of separate tables/search indices etc. Storage wise that would be very similar to how you'd solve it if you didn't have relationships, which would be a component with child data.
Given what I know, I think that's how I would implement it :) You should still be able to use hierarchies/relationships for the top-level objects, but yeah for the skeletal nodes something else probably works better.
Thanks a lot for all the info, it clarifies everything. My use-case doesn't match, indeed. I'll look for alternative representations and hopefully still use flecs for the main structure. Thanks a lot!
Quick update- this issue got me thinking about a feature that lets you "flatten" a static hierarchy, which would be able to combine entities for different parents in the same table. I just got a prototype of it working, lmk if you'd be interested in trying it out :)
Sure, send it over, I'll try it in my sample.
The feature is available on this branch: https://github.com/SanderMertens/flecs/tree/flatten_hierarchy
I still need to do a bit more testing with it but it seems to work reasonably well so far. What it does is it turns a regular tree into a fixed tree, which combines entities from different parents into the same table. To turn a tree into a fixed tree, call:
ecs_make_fixed(world, ecs_pair(EcsChildOf, parent_of_tree));
Systems should still work as usual. There are a few limitations for now:
Let me know if this helps! I plan on merging it in the next few days.
The feature just merged: https://github.com/SanderMertens/flecs/blob/master/include/flecs.h#L2942
Hello
I am having a similar use case (but using is_a
as opposed to child_of
) to @jeanlemotan
So I've added 50k entities (some have been specified as flecs::Final
; no names just world.entity()
to keep things as lean as possible) to an empty world and then created 75k .is_a()
( flecs::IsA
) relationships between some of those entities.
Just adding the 75k relationships takes ~250ms on a beefy Ryzen 9 @4.2 GHz CPU. Is this to be expected?
Also I have to frequently destroy the world and start with a new one where the destruction of the current one takes more than 100ms. Is there a faster way to destruct a filled-in world (no special cleanup required as long as there are no memory leaks of course)?
@SanderMertens any tips on how to speed things up? I was hoping for something like 2-3ms (not 200-300ms) for adding all those relationships :)
@jeanlemotan have you come up with a more efficient representation for those hiearchies?
@alexconstant9108 Make sure that:
NDEBUG
or FLECS_NDEBUG
)-O3
on gcc/clang)Can you provide an example code snippet that shows what takes 250ms? For reference, I just created this quick example which finishes in ~5ms on an M1 laptop:
ecs_time_t t;
ecs_time_measure(&t);
ecs_entity_t base = ecs_new_id(ecs);
for (int i = 0; i < 75000; i ++) {
ecs_entity_t e = ecs_new_id(ecs);
ecs_add_pair(ecs, e, EcsIsA, base);
}
printf("t = %f\n", ecs_time_measure(&t));
It is possible that if you're creating many (thousands) different relationship pairs that this takes some time because of internal administration that's created. This administration is mainly for the benefit of being able to query relationships (though IsA
has a bunch of other features too). If you don't need specific features provided by relationships, you may be able to get better performance from just using a component with an entity value in it.
Additionally, if this is code that initializes a world, make sure to run it outside of a system. By default ECS operations inside systems are inserted as commands, which has additional overhead.
Shutting down a world in 100ms is not unusual, world creation/deletion is not expected to be fast.
Thanks for the tips!
FLECS_NDEBUG
(i.e. disabling the debug assertions) decreased the time taken by a third or so which was still suboptimal for my use case.-O0
on clang but will try -O3
as well.parent
, that is all is_a
relationships point to the same base entity while in my case the hierarchy looks more like a tree with many branches and say up to 30 levels deep. Could that be the reason for the discrepancy in execution time?I will do some more tests and tweaks and report back here.
@SanderMertens
Okay, so having specified -O3, -lto, FLECS_NDEBUG
Here are the results for 2 loops where loop 1, creates 50k entities and stores them into myMap
; loop 2 iterates over 75k pairs and creates an is_a
relationship. Also I am copying myMap[entityIdIntA] and myMap[entityIdIntB] into temporaries in loop 2 before the below line just to make sure that it's not the hashmap access that's slowing things down and also that the compiler won't optimize away the whole loop
Total measurements where the only difference is the following line being commented out or not in the second loop in micro seconds 9928 microseconds == 9.9ms when commented out. 125605 microseconds == 125ms when NOT commented out.
myMap[entityIdIntA].is_a(myMap[entityIdIntB]);
Just to mention: using flecs v3.2.4
Also, I am not using Systems at all, I only have -DFLECS_CUSTOM_BUILD -DFLECS_CPP
Additional question regarding Shutting down a world in 100ms is not unusual, world creation/deletion is not expected to be fast.
Can I reuse a world somehow as a performance improvement? E.g. clear all previous entities and relationships and insert the new one in the old world? Would that be faster tham deleting the old world and creating a new one from scratch?
@alexconstant9108 can you share the code you're benchmarking with? There is a range of things that could be contributing to the 125ms, if you show me exactly what you're doing I might be able to explain what's happening / suggest improvements. I'm not sure what you're doing in loop 2- you have 50k entities but 75k pairs? Showing the code would help a lot.
E.g. clear all previous entities and relationships and insert the new one in the old world? Would that be faster tham deleting the old world and creating a new one from scratch?
Yup it would be. Usually the quickest way to delete all entities is to use the delete_with
operation, if entities have a common component. If they're all in the same subtree, you can delete all entities by deleting the parent.
@SanderMertens here is a contrived example that should reproduce the issue. You'll have to uncomment the full contents of both arrays (note that that'll take a while to compile with full optimizations) for the full test or store the contents into files and read from those.
https://gist.github.com/alexconstant9108/bfee13d3d226634ed40e1beaf3206a7d
Ooops, there is a copy paste error: change the declaration of arrayB to
std::vector<std::pair<std::uint16_t, std::uint16_t>> arrayB
Ooops, there is a copy paste error: change the declaration of arrayB to
You can edit Gists. (It keeps the edit history. It's like a tiny git repository.)
Ooops, there is a copy paste error: change the declaration of arrayB to
You can edit Gists. (It keeps the edit history. It's like a tiny git repository.)
Generally yes, but this gist in particular is huge because @SanderMertens wanted a fully reproducible example
Looks like you have some issues in the code, when I run it (in debug mode) I get an assert:
fatal: id_record.c: 182: assert: !is_final cannot inherit from final entity (CONSTRAINT_VIOLATED)
Looks like this is because you're using explicit ids, and some of them overlap with Flecs ids. Trying to workaround it..
I thought I had already filtered out the few errors in the IDs where the RHS Id had been declared as final. Can you tell me which IDs violate the constraint?
Looks like this is because you're using explicit ids, and some of them overlap with Flecs ids. Trying to workaround it..
What do you mean? I intentionally do NOT specify my integer IDs as the argument to .entity()
since that was messing with flecs' internal IDs. I use just auto currentEntity = world.entity();
It asserts on the first one. You should be able to reproduce by running the code in debug mode.
It asserts on the first one. You should be able to reproduce by running the code in debug mode.
That's weird. Have you fixed the typo in the arrayB declaration? It should be: std::vector<std::pair<std::uint16_t, std::uint16_t>> arrayB
The problem was that you did
entityMap[pair.first].is_a(pair.second);
instead of
entityMap[pair.first].is_a(entityMap[pair.second]);
EDIT- nvm, fixing the typo fixed it.
Took 5 minutes to compile... and got another assert (same one).
Is there another way you could demonstrate the problem? This code is pretty difficult to iterate on.
Bit easier to work with, and I think this may demonstrate the issue you're seeing:
#include "flecs.h"
#define TARGET_COUNT (75000)
#define COUNT (50000)
int main(int argc, char *argv[]) {
flecs::world world;
world.import<flecs::monitor>();
std::vector<flecs::entity> entities;
std::vector<flecs::entity> targets;
for (int i = 0; i < COUNT; i ++) {
entities.push_back(world.entity());
}
for (int i = 0; i < TARGET_COUNT; i ++) {
targets.push_back(world.entity());
}
ecs_time_t t = {0};
ecs_time_measure(&t);
for (auto e : targets) {
int32_t index = rand() % COUNT;
entities[index].is_a(e);
}
printf("time = %f\n", ecs_time_measure(&t));
}
This code for me takes ~110ms to run, which is expected. The reason is that I'm creating 75k unique pairs, which in the storage also creates about as many tables. There's also a bit of additional logic that happens for IsA
relationships, to support things like event propagation and component overriding.
I don't think this could get much faster than this. If you wanted to do 75k operations in 5ms, you'd have ~60ns per operation. That's usually enough for regular component/add remove operations, but not when the storage also has to create the unique pairs & storage tables.
Perhaps, I should rather use a simple tag (e.g. IsALite
) instead of isA
(because in my case I don't actually need to support things like event propagation and component overriding)?
OTOH I actually need the transitive property of the isA
relationship though: if A IsALite B and B IsALite C, then A IsALite C
, so I should be able to ask flecs if A IsALite
C and get true as result.
Took 5 minutes to compile... and got another assert (same one).
Is there another way you could demonstrate the problem? This code is pretty difficult to iterate on.
Yeah, that's why I made so many typos. It's really hard to edit / compile such big inline defined arrays without the whole IDE crashing. You could instead put the contents of each array into two separate files and load them from there on program start
You can define your own relationship that is transitive.
When using entity names and parenting, memory usage increases a lot and performance when iterating with a query decreases significantly.
To repro, I used the
queries_hierarchy
example, modified to create 100K entities and time the execution of the query (not the construction, just the execution).Results:
No names, no parents (no child_of call). Memory Total: 92MB Memory per entity: 0.94K Queries per second: ~1500
Names, no parents Memory: 280MB Memory per entity: 2.87K Queries per second: ~1500
No names, parents Memory: 693MB Memory per entity: 7.1K Queries per second: ~42
Names & Parents Memory: 2700MB Memory per entity: 27.64K Queries per second: ~42
So using parents adds a significant memory overhead, much more than names. Names add a lot as well. Performance takes a significant hit when using parenting, around 3% of the performance of no parents.
The case with parents and names is particularly bad - with around 28K per entity.
Any idea what's going on? Any way this can be improved?
Here is the code, as modified in the
main.cpp
if thequeries_hierarchy
example. You can simply paste it in that file and run it to test it. There are 2 bool at the beginning of the main function to control names and parents.