Open iiYese opened 1 month ago
This is a follow-up to #1470. The other thing that this opens up is efficiently querying (using dynamic queries only, since enum variants aren't types) for enum variants. If we can store tables in a sorted form, requesting all Status::Dead
entities becomes genuinely fast, rather than just sugar for a .filter
operation.
First of all, thanks for the issue!
I'd like to go through the parts of your suggested solution:
Can be done, see below
This is not doable by default. This would require mutable access for Tables
i.e. &mut Tables
which is only obtainable
by engine code with &mut World
, see bevy::ecs::storage docs
, even with interior mutability, this would still not work.
To mutate tables by default, you would need to have mutable access to tables whenever you can obtain a query. Since
systems can have aliasing and disjoint queries, both within themselves and across parallel systems, this is not possible.
Sort calls don't appear in function signatures, so the scheduler can't work around them either.
To mutate a table/world you need an exclusive system or a QueryData
impl that has complete, unique access to a
table, which leads me to make this replacement suggestion:
A set of exclusive systems that expose the same API as the sorts implemented in #13417, without the QueryIter
parameter and commands that can run those systems.
These would no longer use transmute functionality internally, just sort tables directly. Some catches with this are that archetypes can share tables, and we'd need to sort sparse sets as well. This may lead to unexpected behavior on the user side. It'd also be less "table sorts" and more "archetype sorts"
With my suggestion, a user could manually order archetype sorts to happen in front of or after their systems, or spawn commands in their systems to do that.
For each of these archetypes we'd store some kind of "last_sorted_by key". Together with change detection this would allow exiting early. I don't see the need to have queries themselves cache the sorts with this.
Definitely perf improvements to be had here! However this will be heavily data set dependent, so good benchmarks would be needed.
Functionality-wise Iterator::nth
does this. It may be useful to have some kind of indexing
here though, could be done with dedicated functions on the QuerySorted*Iter
structs
On the example itself (other than the already presented issues above):
FilteredEntityRef
and EntityRef
currently act the same in type position, so better use the shorter one.EntityRef
you need to include it in the original QueryData
, lest you alias with disjoint mutable queries.cmp
function is sort_by
(A, B)
& (A, B, C)
with tables that look like this:
// A, B
A(0), B
A(4), B
A(5), B
// A, B, C A(1), B, C A(2), B, C A(3), B, C
And you want to sort by `A` you need to first iter `(A, B): 0..=0` then `(A, B, C): 0..=2` then `(A, B): 1..=2`. This is just a simple case with 2 archetypes and 1 system & 1 sort key. Queries are the only place you can cache this information. Caching at a query level also allows you to do things like:
- Sort by `B`
- Stable sort by `A`
If you want different sorts in different systems. You will get dense iteration across `A` & not completely dense but still some dense iteration across `B`.
4. Benchmarks are irrelevant as the worst case is simply what we already have now which is completely random access.
5. `Iterator::nth` still iterates <https://doc.rust-lang.org/src/core/iter/traits/iterator.rs.html#305> it is not an `O(1)` operation as this would be.
> FilteredEntityRef and EntityRef currently act the same in type position, so better use the shorter one.
I don't think you actually understand the difference between these. `EntityRef` gets read access to *all* of an entities components. Meaning you'll block parallelism if you allow this safely. `FilteredEntityRef` just acts like `EntityRef` but only has access to a subset of components, in this case it'd be whatever appears in the `D: QueryData` part of the query.
> To sort by EntityRef you need to include it in the original QueryData, lest you alias with disjoint mutable queries.
That is another reason why it has `FilteredEntityRef`. You can transmute this from any `D: QueryData`.
I've updated the seek example to be more flexible.
It seems I misunderstood you on this. How would this command spawning look like? You say that this would use thread
local commands, does this mean we would be using Commands
implicitly? Or are we having the users pass it?
I still think the explicit sort systems would be useful for users, though separately, since it doesn't seem to cover your
needs here.
In that case, we'd be mutating QueryState
, right? Does this fit with current ECS plans?
In terms of execution, I would still not want it by default, because this would be hidden behavior and potentially
introduce unexpected costs for the user on first iteration. How about a Cached
enum as an additional parameter,
instead of chained method calls? The user decides explicitly whether the sort is cached, no need for implicit cost other
than a single, well-predicted branch. The enum would have two variants, one for cached and uncached behavior each.
I'd still like to see two cases: The new version that uses adjacent runs of data vs the current version, on a dataset with no adjacent items after sort, and a dataset that is already sorted except for one item, i.e. the worst and best case. AFAIK, there are no benchmarks on this yet, so I would not like to be guessing how big of a difference it makes.
I wasn't clear on this, that's what I meant!
On FilteredEntityRef
, if that's how it works, I didn't actually understand it: This ability needs more docs, neither transmute_lens
nor the struct itself say it can be transmuted from any QueryData
.
On main there's already World::commands the problem is currently this takes &mut World
where as queries get &World
so the addition of World::parallel_commands
which would take &World
or just making the command queue thread local would allow queries to enqueue commands.
There's nothing incompatible about this with future ECS features. Either on QueryState
or Query
is fine. It will not be hidden behavior because it will be explicitly stated in the docs & there will be a way to opt out of it. With regards to an additional param I prefer well designed defaults with ways to opt out instead of noisy configuration.
This is not guessing. I can confidently assert the difference this would make. You do not need this to be implemented to observe what is well established hardware behavior. It is the reason bevy and other archetypal ECSs tend to be orders of magnitude faster at iteration than things like sparse set ECSs. If you really want you can play around with flecs which already has this implementation.
Taking a broader view, are there any other things that need caching on the Query
level? To me this sounds like a problem
that isn't unique to sorting queries. Would it make sense to make use of some shared infrastructure here?
I think we disagree here. Of course this will depend on implementation, but as per my current understanding:
My issue with this is that sort
itself would queue the caching, and no_reorder
would unqueue that
caching within the same method chain.
Not wanting to cache sorts will introduce that configuration noise you speak of into an iter
chain that one would like to
for
loop over.
How about we have some call that takes Query
mutably, and mutate some queue_sort_cache
setting, causing subsequent
sorts to follow that behavior? (Sorts would use an already present cache regardless of this setting)
If you sort the same query twice in a different way in the same system, the "no-reorder noise" can be placed before the
iteration loop, and if a user still wants a sort to be cached afterwards, they can queue the command manually.
This would also mean we could introduce some global setting that enable or disables this behavior entirely. We would still need a default, but this would be cleaner IMO.
I know that it is faster and can be by a magnitude, and I guess this wouldn't be a blocker, but I would like to be able to show by how much!
3./4./5. Most improvements to QuerySortedIter
should also be applied to QueryManyIter
and its future offshoot types.
QuerySortedIter
is basically a special case of the more general QueryManyIter
.
Taking a broader view, are there any other things that need caching on the Query level? To me this sounds like a problem that isn't unique to sorting queries. Would it make sense to make use of some shared infrastructure here?
With queries as entities queries can have row polymorphism like everything else that's at an ECS level. This means caching is no longer limited to either:
Query entities can have different components for whatever caches they need. Aside from sort caches they will need other cache structures as we add new features. Two things that come to mind:
I would not worry about queries as entities as they're one of the prerequisite/accompanying features for relations and the sort cache can be refactored when they land.
Another question I have about the final API example:
Currently, a user calls iter
, then sort
, here this is reversed. When sort
also lands for iter_many
, how would this work?
The user can no longer specify a subset of a query to sort. I.e. if a user wishes to sort 3 out of a 1000 entities, then they do not wish to pay for a full query sort in advance.
Additionally, since the actual table is only sorted after the first run, the query.sort
call would have to return a sorted Vec
of query results to iterate over. Entity
is not a simple index into this, and its matching item would instead have to be found in a more expensive manner.
Currently, a user calls iter, then sort, here this is reversed. When sort also lands for iter_many, how would this work?
This will either remove the existing API or exist along side it.
The user can no longer specify a subset of a query to sort. I.e. if a user wishes to sort 3 out of a 1000 entities, then they do not wish to pay for a full query sort in advance.
This sounds like an esoteric misfeature. How would a user know exactly which subset of items they would want sorted in the first place? There are zero guarantees about iteration order.
Additionally, since the actual table is only sorted after the first run, the query.sort call would have to return a sorted Vec of query results to iterate over. Entity is not a simple index into this, and its matching item would instead have to be found in a more expensive manner.
There should be no single use vectors. QueryState
/Query
wherever the cache ends up gets 2 new fields:
sort_cache: Vec<(TableID, RangeInclusive<usize>)>
sort_cursor: (usize, usize)
If .seek
is not used the sort_cursor is reset to (0, 0)
.
The user can no longer specify a subset of a query to sort. I.e. if a user wishes to sort 3 out of a 1000 entities, then they do not wish to pay for a full query sort in advance.
This sounds like an esoteric misfeature. How would a user know exactly which subset of items they would want sorted in the first place? There are zero guarantees about iteration order.
F.e. if we want to sort the children of an entity. The Children
or some other component stores a list of entities, which we retrieve by iter_many
, then sort. I do not find this esoteric at all.
This would only be relevant if this is intended to replace the existing API.
That's not a subset of entities in a query that's mapping an entity list. Which is at best niche because
Children
already has sort functions which you can use instead to get persistent ordering that you don't sort every run.Children
& bevy_hierarchy
will be replaced with relations which will not have entity lists. The only way to get sorting period then will be at a query level.Essentially the first run of the system will not have dense iteration, it will enqueue the command to reorder tables & the second run will have dense iteration.
When should the command be executed? If it's executed right away and reorders the Table
, then it's very likely that the components will be mutated before the second sort is needed - requiring another table reordering just in time for the sorted query.
The logical conclusion is that the reordering of the table must happen right before the system that requires the sort runs (or after the last mutable access, but its impossible to know when that would be).
After thinking about it, I think there are 3 options that allow for dense, sorted iteration:
1) Designate certain archetypes as "hot", meaning their tables would have to sort themselves after each mutation (only one sort per archetype)
2) Let the user call a command to sort archetypes manually (A bit of a footgun)
3) Reorder the table right at the .sort()
call, and then iterate densely (need a special type of Query for that, because no one must access the table at the time of the sort)
When should the command be executed
In the next sync point. So basically the first run would be without dense iteration, the second run will need to rebuild the sort cache again and every run after that will be dense. (Until you have a mutation which will start this whole process again)
What problem does this solve or what need does it fill?
13417 adds sorting at an iterator level. These sorts:
What solution would you like?
low..=high
ranges for cache friendly iteration.Final API should look something like this:
What alternative(s) have you considered?
Additional context
Missing prerequisite features:
World::parallel_commands
/make the world command queue thread local by default (needed to allow queries to do2.
).3.
).Not required but helpful missing feature: