Open iiYese opened 6 months ago
This is a follow-up to #1470. The other thing that this opens up is efficiently querying (using dynamic queries only, since enum variants aren't types) for enum variants. If we can store tables in a sorted form, requesting all Status::Dead
entities becomes genuinely fast, rather than just sugar for a .filter
operation.
First of all, thanks for the issue!
I'd like to go through the parts of your suggested solution:
Can be done, see below
This is not doable by default. This would require mutable access for Tables
i.e. &mut Tables
which is only obtainable
by engine code with &mut World
, see bevy::ecs::storage docs
, even with interior mutability, this would still not work.
To mutate tables by default, you would need to have mutable access to tables whenever you can obtain a query. Since
systems can have aliasing and disjoint queries, both within themselves and across parallel systems, this is not possible.
Sort calls don't appear in function signatures, so the scheduler can't work around them either.
To mutate a table/world you need an exclusive system or a QueryData
impl that has complete, unique access to a
table, which leads me to make this replacement suggestion:
A set of exclusive systems that expose the same API as the sorts implemented in #13417, without the QueryIter
parameter and commands that can run those systems.
These would no longer use transmute functionality internally, just sort tables directly. Some catches with this are that archetypes can share tables, and we'd need to sort sparse sets as well. This may lead to unexpected behavior on the user side. It'd also be less "table sorts" and more "archetype sorts"
With my suggestion, a user could manually order archetype sorts to happen in front of or after their systems, or spawn commands in their systems to do that.
For each of these archetypes we'd store some kind of "last_sorted_by key". Together with change detection this would allow exiting early. I don't see the need to have queries themselves cache the sorts with this.
Definitely perf improvements to be had here! However this will be heavily data set dependent, so good benchmarks would be needed.
Functionality-wise Iterator::nth
does this. It may be useful to have some kind of indexing
here though, could be done with dedicated functions on the QuerySorted*Iter
structs
On the example itself (other than the already presented issues above):
FilteredEntityRef
and EntityRef
currently act the same in type position, so better use the shorter one.EntityRef
you need to include it in the original QueryData
, lest you alias with disjoint mutable queries.cmp
function is sort_by
(A, B)
& (A, B, C)
with tables that look like this:
// A, B
A(0), B
A(4), B
A(5), B
// A, B, C A(1), B, C A(2), B, C A(3), B, C
And you want to sort by `A` you need to first iter `(A, B): 0..=0` then `(A, B, C): 0..=2` then `(A, B): 1..=2`. This is just a simple case with 2 archetypes and 1 system & 1 sort key. Queries are the only place you can cache this information. Caching at a query level also allows you to do things like:
- Sort by `B`
- Stable sort by `A`
If you want different sorts in different systems. You will get dense iteration across `A` & not completely dense but still some dense iteration across `B`.
4. Benchmarks are irrelevant as the worst case is simply what we already have now which is completely random access.
5. `Iterator::nth` still iterates <https://doc.rust-lang.org/src/core/iter/traits/iterator.rs.html#305> it is not an `O(1)` operation as this would be.
> FilteredEntityRef and EntityRef currently act the same in type position, so better use the shorter one.
I don't think you actually understand the difference between these. `EntityRef` gets read access to *all* of an entities components. Meaning you'll block parallelism if you allow this safely. `FilteredEntityRef` just acts like `EntityRef` but only has access to a subset of components, in this case it'd be whatever appears in the `D: QueryData` part of the query.
> To sort by EntityRef you need to include it in the original QueryData, lest you alias with disjoint mutable queries.
That is another reason why it has `FilteredEntityRef`. You can transmute this from any `D: QueryData`.
I've updated the seek example to be more flexible.
It seems I misunderstood you on this. How would this command spawning look like? You say that this would use thread
local commands, does this mean we would be using Commands
implicitly? Or are we having the users pass it?
I still think the explicit sort systems would be useful for users, though separately, since it doesn't seem to cover your
needs here.
In that case, we'd be mutating QueryState
, right? Does this fit with current ECS plans?
In terms of execution, I would still not want it by default, because this would be hidden behavior and potentially
introduce unexpected costs for the user on first iteration. How about a Cached
enum as an additional parameter,
instead of chained method calls? The user decides explicitly whether the sort is cached, no need for implicit cost other
than a single, well-predicted branch. The enum would have two variants, one for cached and uncached behavior each.
I'd still like to see two cases: The new version that uses adjacent runs of data vs the current version, on a dataset with no adjacent items after sort, and a dataset that is already sorted except for one item, i.e. the worst and best case. AFAIK, there are no benchmarks on this yet, so I would not like to be guessing how big of a difference it makes.
I wasn't clear on this, that's what I meant!
On FilteredEntityRef
, if that's how it works, I didn't actually understand it: This ability needs more docs, neither transmute_lens
nor the struct itself say it can be transmuted from any QueryData
.
On main there's already World::commands the problem is currently this takes &mut World
where as queries get &World
so the addition of World::parallel_commands
which would take &World
or just making the command queue thread local would allow queries to enqueue commands.
There's nothing incompatible about this with future ECS features. Either on QueryState
or Query
is fine. It will not be hidden behavior because it will be explicitly stated in the docs & there will be a way to opt out of it. With regards to an additional param I prefer well designed defaults with ways to opt out instead of noisy configuration.
This is not guessing. I can confidently assert the difference this would make. You do not need this to be implemented to observe what is well established hardware behavior. It is the reason bevy and other archetypal ECSs tend to be orders of magnitude faster at iteration than things like sparse set ECSs. If you really want you can play around with flecs which already has this implementation.
Taking a broader view, are there any other things that need caching on the Query
level? To me this sounds like a problem
that isn't unique to sorting queries. Would it make sense to make use of some shared infrastructure here?
I think we disagree here. Of course this will depend on implementation, but as per my current understanding:
My issue with this is that sort
itself would queue the caching, and no_reorder
would unqueue that
caching within the same method chain.
Not wanting to cache sorts will introduce that configuration noise you speak of into an iter
chain that one would like to
for
loop over.
How about we have some call that takes Query
mutably, and mutate some queue_sort_cache
setting, causing subsequent
sorts to follow that behavior? (Sorts would use an already present cache regardless of this setting)
If you sort the same query twice in a different way in the same system, the "no-reorder noise" can be placed before the
iteration loop, and if a user still wants a sort to be cached afterwards, they can queue the command manually.
This would also mean we could introduce some global setting that enable or disables this behavior entirely. We would still need a default, but this would be cleaner IMO.
I know that it is faster and can be by a magnitude, and I guess this wouldn't be a blocker, but I would like to be able to show by how much!
3./4./5. Most improvements to QuerySortedIter
should also be applied to QueryManyIter
and its future offshoot types.
QuerySortedIter
is basically a special case of the more general QueryManyIter
.
Taking a broader view, are there any other things that need caching on the Query level? To me this sounds like a problem that isn't unique to sorting queries. Would it make sense to make use of some shared infrastructure here?
With queries as entities queries can have row polymorphism like everything else that's at an ECS level. This means caching is no longer limited to either:
Query entities can have different components for whatever caches they need. Aside from sort caches they will need other cache structures as we add new features. Two things that come to mind:
I would not worry about queries as entities as they're one of the prerequisite/accompanying features for relations and the sort cache can be refactored when they land.
Another question I have about the final API example:
Currently, a user calls iter
, then sort
, here this is reversed. When sort
also lands for iter_many
, how would this work?
The user can no longer specify a subset of a query to sort. I.e. if a user wishes to sort 3 out of a 1000 entities, then they do not wish to pay for a full query sort in advance.
Additionally, since the actual table is only sorted after the first run, the query.sort
call would have to return a sorted Vec
of query results to iterate over. Entity
is not a simple index into this, and its matching item would instead have to be found in a more expensive manner.
Currently, a user calls iter, then sort, here this is reversed. When sort also lands for iter_many, how would this work?
This will either remove the existing API or exist along side it.
The user can no longer specify a subset of a query to sort. I.e. if a user wishes to sort 3 out of a 1000 entities, then they do not wish to pay for a full query sort in advance.
This sounds like an esoteric misfeature. How would a user know exactly which subset of items they would want sorted in the first place? There are zero guarantees about iteration order.
Additionally, since the actual table is only sorted after the first run, the query.sort call would have to return a sorted Vec of query results to iterate over. Entity is not a simple index into this, and its matching item would instead have to be found in a more expensive manner.
There should be no single use vectors. QueryState
/Query
wherever the cache ends up gets 2 new fields:
sort_cache: Vec<(TableID, RangeInclusive<usize>)>
sort_cursor: (usize, usize)
If .seek
is not used the sort_cursor is reset to (0, 0)
.
The user can no longer specify a subset of a query to sort. I.e. if a user wishes to sort 3 out of a 1000 entities, then they do not wish to pay for a full query sort in advance.
This sounds like an esoteric misfeature. How would a user know exactly which subset of items they would want sorted in the first place? There are zero guarantees about iteration order.
F.e. if we want to sort the children of an entity. The Children
or some other component stores a list of entities, which we retrieve by iter_many
, then sort. I do not find this esoteric at all.
This would only be relevant if this is intended to replace the existing API.
That's not a subset of entities in a query that's mapping an entity list. Which is at best niche because
Children
already has sort functions which you can use instead to get persistent ordering that you don't sort every run.Children
& bevy_hierarchy
will be replaced with relations which will not have entity lists. The only way to get sorting period then will be at a query level.Essentially the first run of the system will not have dense iteration, it will enqueue the command to reorder tables & the second run will have dense iteration.
When should the command be executed? If it's executed right away and reorders the Table
, then it's very likely that the components will be mutated before the second sort is needed - requiring another table reordering just in time for the sorted query.
The logical conclusion is that the reordering of the table must happen right before the system that requires the sort runs (or after the last mutable access, but its impossible to know when that would be).
After thinking about it, I think there are 3 options that allow for dense, sorted iteration:
1) Designate certain archetypes as "hot", meaning their tables would have to sort themselves after each mutation (only one sort per archetype)
2) Let the user call a command to sort archetypes manually (A bit of a footgun)
3) Reorder the table right at the .sort()
call, and then iterate densely (need a special type of Query for that, because no one must access the table at the time of the sort)
When should the command be executed
In the next sync point. So basically the first run would be without dense iteration, the second run will need to rebuild the sort cache again and every run after that will be dense. (Until you have a mutation which will start this whole process again)
Hi!
If indexes described here https://github.com/bevyengine/bevy/discussions/1205, are implemented at a point, they could also resolve the sorted queries feature. To achieve that, indexes should be implemented as BTrees, like SQL database indexes. They could implement 2 different goals:
@mcatanzariti I fail to see how indices are anything but orthogonal here. Specifically indices do not allow for dense cache friendly iteration. An indirection has to be resolved every time. The Ergonomics are also far worse from looking at the issue.
@BenjaminBrienen Please give motivation when adding labels like Needs Design & Contentious to issues. This issue doesn't just describe a complete design of the feature for bevy but this feature is already implemented in another ECS so I'm removing that label. I'm not seeing how this is contentious either.
Ok. I gave it that status because it didn't have a status and it seemed like there was a lot of uncertainty in what the work should look like. I gave it the other label because the long discussion seemed like it fit the description of "has nontrivial implications". Please assign it a better status if you like.
It's not really S-Blocked
because it can be implemented without archetype level change detection & such but to see its full usefulness the other items mentioned also need to be implemented however I wouldn't proceed implementing this without those either. So no label suits. The discussion is asking about how things would be implemented/behave because this is just a complicated feature. Haven't been any major disagreements yet.
@BenjaminBrienen Please give motivation when adding labels like Needs Design & Contentious to issues. This issue doesn't just describe a complete design of the feature for bevy but this feature is already implemented in another ECS so I'm removing that label. I'm not seeing how this is contentious either.
I actually disagree. While this feature has been implemented in another ECS, I think Bevy can and should make its own considerations and decisions when implementing such a feature; so that it makes sense in both Bevy and Rust contexts. I think "X-Contentious" is applicable because there is potential for confusing/unintuitive behavior for users, and it is not immediately clear how to remedy this. In my opinion, "S-Needs-Design" means that it is not yet clear what design best captures the considerations we want to make. While the currently proposed design is one that could work, I do not think it has been decided that it is the best that there could be.
I think Bevy can and should make its own considerations and decisions when implementing such a feature; so that it makes sense in both Bevy and Rust contexts.
If there is things you wish to add identify and describe them. Like I already have with the seek
portion of the API which is not present in flecs. That is entirely new. There is nothing about languages that would make the implementation meaningfully different nor the ECS. Both bevy & flecs are archetypal ECSs. Bevy might store the caches differently right now because of how queries are implemented but that is not a meaningful difference. The things we're sorting & caching are the same.
I think "X-Contentious" is applicable because there is potential for confusing/unintuitive behavior for users, and it is not immediately clear how to remedy this.
Please identify and describe the issue.
In my opinion, "S-Needs-Design" means that it is not yet clear what design best captures the considerations we want to make. While the currently proposed design is one that could work, I do not think it has been decided that it is the best that there could be.
Please describe a design you think is better.
First, I would say that I think these labels to invite discussion, and to remove them I find akin to saying "we no longer need to think about these aspects". I'd prefer to foster discussion rather than shut it down!
I'll write up my technical points sometime soon.
There's nothing stopping you from discussing an already complete design. The thing I don't invite is:
These things create endless discussion for the sake of discussion & waste peoples energy with no productive influence on designs or implementations.
I already had some technical thoughts in mind when I initially stated my disagreement, but I did not identify them; exactly because I did not want to waste your and any readers time and wanted to get it/phrase it right first before reengaging! I see that I could have waited with re-adding the labels until I made those points.
For your second point, I do not quite understand what you are referring to with "imagined hypotheticals".
I do not quite understand what you are referring to with "imagined hypotheticals"
An example of this would be "What if they want a custom sorting algorithm instead of the one we provide". This is a purely abstract feature with a person trying to accomplish something absent. These usually fail to examine if people would ever want to do this in the first place or if it's something that would be useful to do at all. When you do examine these they quickly fall apart. For example different sorting algorithms usually pair well with different data structures. Since the data structure of the ECS is fixed I would find this suggestion dubious.
This kind of strategy tries to accommodate uses cases by allowing for every use case imaginable & hoping the use cases that actually exist are covered.
I do not quite understand what you are referring to with "imagined hypotheticals"
An example of this would be "What if they want a custom sorting algorithm instead of the one we provide". This is a purely abstract feature with a person trying to accomplish something absent. These usually fail to examine if people would ever want to do this in the first place or if it's something that would be useful to do at all. When you do examine these they quickly fall apart. For example different sorting algorithms usually pair well with different data structures. Since the data structure of the ECS is fixed I would find this suggestion dubious.
This kind of strategy tries to accommodate uses cases by allowing for every use case imaginable & hoping the use cases that actually exist are covered.
I don't feel this example illustrates your point well, because different sorting algorithms (as in glidesort, pdqsort, radixsort, etc.) can fit different types of data and patterns in the order of that data, and less what data structure this data is from. As long as the user decides what data to store, and a slice can be formed with that data, then passing a custom sort algorithm as a function on top of a cmp function could make sense.
That is separate from these questions: Can this be supported? Should this be supported? If so, should this be done alongside what we are currently trying to accomplish?
The above is debatable, and the value/usefulness of some ideas/the very discussion of them can be rather subjective. However, I think a person hearing that their suggestion "is a purely abstract feature [...] trying to accomplish something absent" would likely feel frustrated by that.
That being said, we are getting off-topic. While we may disagree, I don't think anyone is trying to create a bad end result!
I don't feel this example illustrates your point well
I feel you've demonstrated my point even better. You are humoring an idea without users in mind solely for the sake of even more API coverage for what is an even more niche scenario.
I don't think anyone is trying to create a bad end result!
That's not the issue here. These kinds of discussions try to create perfect results first time in one cycle which is not a sustainable way to develop.
I don't feel this example illustrates your point well
I feel you've demonstrated my point even better. You are humoring an idea without users in mind solely for the sake of even more API coverage for what is an even more niche scenario.
We are talking past each other. By saying "it could make sense", I do not mean "we should do this!" I am saying that I respect that it could be useful, however to actually include it depends on whether it fits with what Bevy is trying to do. I do not know what someone has in mind when they make a suggestion, but they might be accounting for things I am not currently aware of.
In this example, my reasoning is: In Rust we sort slices with std API. This is what the vast majority of people do. There are crates that offer different sorting algorithms, used for slices, instead. These crates see use. Can somebody express cached query sorting with a custom algorithm in user land? Not easily, if at all. Thus, there are some people that would be happy about being able to use API for the above. Do we have to make those users happy? As of right now, I would say probably not, but I am not the one who decides.
I don't think anyone is trying to create a bad end result!
That's not the issue here. These kinds of discussions try to create perfect results first time in one cycle which is not a sustainable way to develop.
I'm not saying we have to create perfect results the first time, I am trying to say that deciding not to do/account for something is also part of design.
I do not wish to continue this tangent. It seems we have different philosophies and are not likely to agree. Let us return to technical conversation.
On to questions about what this design intends to do:
I.
Is it intended to only consist of one sort
function, that only takes Fn(FilteredEntityRef, FilteredEntityRef) -> Ordering
?
II. Since you have mentioned "sorting algorithm we provide", would sorting a table involve a different form of sorting than sorting a slice, like how one usually would? (I am aware that a table is more than slice, I am asking whether the "sort" at the core of the "table sort" operation would be on a slice)
III. Does this approach have a way of mitigating conflicts between sorts? Especially since multiple archetypes might be affected by sorting one table. You've stated that the first run of the system would not have dense iteration, and queue a command so we can iterate densely after. If two systems sort queries with intersecting (or the same) archetypes, they might fall into a cycle of not achieving dense iteration, since both commands will each undo the sort from the last, wasting work.
IV. Similarly, does it account for systems wishing to perform a sort that has already been done? If an archetype has not been modified since the last time a system queued a command, it can early out of resorting thanks to change detection. However how about when a table has been modified, but in a way that preserves order? (i.e. entity removal) Further, can it take advantage of multiple systems wanting an equivalent sort?
V.
How would this design think of query sorts in terms of "fallibility". When given an "incorrect" cmp
function/Ord
implementation, the behavior of std library sorts is unspecified (= logic error, not undefined behavior), it may panic, silently return incorrect results, or not finish in a realistic amount of time.
VI. You have stated in #16203:
The public user facing APIs for the cached variants will only allow fn pointers because closures can be stateful & alter how they sort.
The cache would have no way of detecting if this state is changed hence they're not allowed.
A closure being stateful does not guarantee non-determinism. To have a function/non-capturing closure does not guarantee determinism. There is no way to guarantee determinism with function types/traits unless you know their implementation beforehand, how does the cache here deal with this? (This ties in with question V. above)
VII. Are non-archetypal filters compatible with this? Can they interfere with caching? (By evaluating differently without any mutation happening?)
VIII.
What happens when a query created mid-system (transmute_lens
, join
, etc) is sorted?
On the seek portion:
IX.
Is this separate from, potentially in conflict with, or trying to model what the Seek
trait in std does? (This trait is used to provide a movable cursor within a stream of bytes)
I. Yes
II. Doesn't matter & is irrelevant. Any further optimization or customization should be done in a follow up.
III. That's the whole point of opting out of table sorting & using stable sorts.
IV. This is well beyond design and into implementation details now. Change detection cannot report this. Archetype level change detection is needed. Either way the low..high
vec needs to be updated.
V. That is user error.
VI. That is the best we can do. Do not over engineer. Anything else is user error.
VII. No.
VIII. Transmutes are not new queries & neither are joins. Joins that iterate over already sorted queries iterate a subset of the matches. Meaning they're still sorted.
IX. There is no conflict. If anything it'll be an analogy for users. They're both doing a similar operation. "Seek"ing isn't unique to rust std.
Please use numbers we're not Roamns.
What problem does this solve or what need does it fill?
13417 adds sorting at an iterator level. These sorts:
What solution would you like?
low..=high
ranges for cache friendly iteration.Final API should look something like this:
What alternative(s) have you considered?
Additional context
Missing prerequisite features:
World::parallel_commands
/make the world command queue thread local by default (needed to allow queries to do2.
).3.
).Not required but helpful missing feature: