iiYese commented 6 months ago

What problem does this solve or what need does it fill?

13417 adds sorting at an iterator level. These sorts:

Do not persist across system runs (it will be common for sorted data to rarely be mutated).
Make query iteration completely random access.

What solution would you like?

Command to reorder tables.
Rearrange tables for sorted queries by default.
Make queries cache sorts & use change detection to early out of resorting.
Condense sorts to low..=high ranges for cache friendly iteration.
Optionally cache iteration position to allow suspending and resuming iteration between system runs. This is useful for ui scroll areas and any rhythm game.

Final API should look something like this:

fn cmp(left: FilteredEntityRef, right: FilteredEntityRef) -> std::cmp::Ordering {
    left.get::<A>().unwrap().cmp(&right.get::<A>().unwrap())
}

fn sys(mut query: Query<(&A, &B)>) {
    for (a, b) in query.sort(cmp)
        // Optional call that will prevent tables from being reorderd.
        // Useful if you have multiple systems requesting slightly different sorts or
        // just completely opting out of reordering & doing multiple ordering manually.
        // User could do multiple stable sorts to get dense iteration across multiple systems.
        .no_reorder()
        .iter()
    {
        // ..
    }
}

#[derive(Component)]
struct Offset(pub f64);

fn sort_by_offset(left: FilteredEntityRef, right: FilteredEntityRef) -> std::cmp::Ordering {
    left.get::<Offset>().unwrap().0.cmp(&right.get::<Offset>().unwrap().0)
}

fn seek_by_offset(entity: FilteredEntityRef, needle: f64) -> std::cmp::Ordering {
    entity.get::<Offset>().unwrap().0.cmp(&needle)
}

fn timeline(mut query: Query<(&Offset, &A, &B)>, time: Res<Time>) {
    for (offset, a, b) in query.sort(sort_by_offset)
        // Resume iteration from a previous run & update it for a future run.
        // Assuming the following:
        // - some cached pos: usize
        // - a needle: T
        // - a const window size N: usize
        // 
        // seeking will:
        // - Use linear search to search for the needle among nearest neighbors pos-N..=pos+N.
        // - Fall back to binary search if the range doesn't contain the needle (we've jumped).
        .seek(time.elapsed_seconds_f64(), seek_by_offset)
        .iter()
        .take(5)
    {
        // ..
    }
}

What alternative(s) have you considered?

Putting things in resources and doing these sorts myself. This is only viable when you've feature frozen your project and need no changes to data architecture. You completely lose the flexibility of ECS doing this.
Despawning entities & reinserting them into tables. This first requires knowing all the components to take out of an entity and secondly relies on side effects. Table order is not part of the API contract.

Additional context

Missing prerequisite features:

World::parallel_commands/make the world command queue thread local by default (needed to allow queries to do 2.).
Archetype level change detection: #5097 (needed for 3.).

Not required but helpful missing feature:

Queries as entities (can avoid introducing the cache structure for queries that aren't sorted).

alice-i-cecile commented 6 months ago

This is a follow-up to #1470. The other thing that this opens up is efficiently querying (using dynamic queries only, since enum variants aren't types) for enum variants. If we can store tables in a sorted form, requesting all Status::Dead entities becomes genuinely fast, rather than just sugar for a .filter operation.

Victoronz commented 6 months ago

First of all, thanks for the issue!

I'd like to go through the parts of your suggested solution:

Can be done, see below
This is not doable by default. This would require mutable access for Tables i.e. &mut Tables which is only obtainable by engine code with &mut World, see bevy::ecs::storage docs, even with interior mutability, this would still not work. To mutate tables by default, you would need to have mutable access to tables whenever you can obtain a query. Since systems can have aliasing and disjoint queries, both within themselves and across parallel systems, this is not possible. Sort calls don't appear in function signatures, so the scheduler can't work around them either.

To mutate a table/world you need an exclusive system or a QueryData impl that has complete, unique access to a table, which leads me to make this replacement suggestion: A set of exclusive systems that expose the same API as the sorts implemented in #13417, without the QueryIter parameter and commands that can run those systems.

These would no longer use transmute functionality internally, just sort tables directly. Some catches with this are that archetypes can share tables, and we'd need to sort sparse sets as well. This may lead to unexpected behavior on the user side. It'd also be less "table sorts" and more "archetype sorts"

With my suggestion, a user could manually order archetype sorts to happen in front of or after their systems, or spawn commands in their systems to do that.
For each of these archetypes we'd store some kind of "last_sorted_by key". Together with change detection this would allow exiting early. I don't see the need to have queries themselves cache the sorts with this.
Definitely perf improvements to be had here! However this will be heavily data set dependent, so good benchmarks would be needed.
Functionality-wise Iterator::nth does this. It may be useful to have some kind of indexing here though, could be done with dedicated functions on the QuerySorted*Iter structs

On the example itself (other than the already presented issues above):

FilteredEntityRef and EntityRef currently act the same in type position, so better use the shorter one.
To sort by EntityRef you need to include it in the original QueryData, lest you alias with disjoint mutable queries.
nit: the sort method that uses a cmp function is sort_by

iiYese commented 6 months ago

Yes it is. Flecs already does this. You would not need mut access to tables. This is why a command is used. See the first point in the additional notes. Essentially the first run of the system will not have dense iteration, it will enqueue the command to reorder tables & the second run will have dense iteration. I don't like the alternate suggestion it's too clumsy.
You cannot store caches at an archetype level. Archetypes are not enough. If you have archetypes (A, B) & (A, B, C) with tables that look like this:
```
// A, B
A(0), B
A(4), B
A(5), B
```

// A, B, C A(1), B, C A(2), B, C A(3), B, C


And you want to sort by `A` you need to first iter `(A, B): 0..=0` then `(A, B, C): 0..=2` then `(A, B): 1..=2`. This is just a simple case with 2 archetypes and 1 system & 1 sort key. Queries are the only place you can cache this information. Caching at a query level also allows you to do things like:
- Sort by `B`
- Stable sort by `A`

If you want different sorts in different systems. You will get dense iteration across `A` & not completely dense but still some dense iteration across `B`.

4. Benchmarks are irrelevant as the worst case is simply what we already have now which is completely random access.

5. `Iterator::nth` still iterates <https://doc.rust-lang.org/src/core/iter/traits/iterator.rs.html#305> it is not an `O(1)` operation as this would be.

> FilteredEntityRef and EntityRef currently act the same in type position, so better use the shorter one.

I don't think you actually understand the difference between these. `EntityRef` gets read access to *all* of an entities components. Meaning you'll block parallelism if you allow this safely. `FilteredEntityRef` just acts like `EntityRef` but only has access to a subset of components, in this case it'd be whatever appears in the `D: QueryData` part of the query.

> To sort by EntityRef you need to include it in the original QueryData, lest you alias with disjoint mutable queries.

That is another reason why it has `FilteredEntityRef`. You can transmute this from any `D: QueryData`.

iiYese commented 6 months ago

I've updated the seek example to be more flexible.

Victoronz commented 6 months ago

It seems I misunderstood you on this. How would this command spawning look like? You say that this would use thread local commands, does this mean we would be using Commands implicitly? Or are we having the users pass it? I still think the explicit sort systems would be useful for users, though separately, since it doesn't seem to cover your needs here.
In that case, we'd be mutating QueryState, right? Does this fit with current ECS plans? In terms of execution, I would still not want it by default, because this would be hidden behavior and potentially introduce unexpected costs for the user on first iteration. How about a Cached enum as an additional parameter, instead of chained method calls? The user decides explicitly whether the sort is cached, no need for implicit cost other than a single, well-predicted branch. The enum would have two variants, one for cached and uncached behavior each.
I'd still like to see two cases: The new version that uses adjacent runs of data vs the current version, on a dataset with no adjacent items after sort, and a dataset that is already sorted except for one item, i.e. the worst and best case. AFAIK, there are no benchmarks on this yet, so I would not like to be guessing how big of a difference it makes.
I wasn't clear on this, that's what I meant!

On FilteredEntityRef, if that's how it works, I didn't actually understand it: This ability needs more docs, neither transmute_lens nor the struct itself say it can be transmuted from any QueryData.

iiYese commented 6 months ago

On main there's already World::commands the problem is currently this takes &mut World where as queries get &World so the addition of World::parallel_commands which would take &World or just making the command queue thread local would allow queries to enqueue commands.
There's nothing incompatible about this with future ECS features. Either on QueryState or Query is fine. It will not be hidden behavior because it will be explicitly stated in the docs & there will be a way to opt out of it. With regards to an additional param I prefer well designed defaults with ways to opt out instead of noisy configuration.
This is not guessing. I can confidently assert the difference this would make. You do not need this to be implemented to observe what is well established hardware behavior. It is the reason bevy and other archetypal ECSs tend to be orders of magnitude faster at iteration than things like sparse set ECSs. If you really want you can play around with flecs which already has this implementation.

Victoronz commented 6 months ago

Taking a broader view, are there any other things that need caching on the Query level? To me this sounds like a problem that isn't unique to sorting queries. Would it make sense to make use of some shared infrastructure here?

I think we disagree here. Of course this will depend on implementation, but as per my current understanding: My issue with this is that sort itself would queue the caching, and no_reorder would unqueue that caching within the same method chain. Not wanting to cache sorts will introduce that configuration noise you speak of into an iter chain that one would like to for loop over. How about we have some call that takes Query mutably, and mutate some queue_sort_cache setting, causing subsequent sorts to follow that behavior? (Sorts would use an already present cache regardless of this setting) If you sort the same query twice in a different way in the same system, the "no-reorder noise" can be placed before the iteration loop, and if a user still wants a sort to be cached afterwards, they can queue the command manually.

This would also mean we could introduce some global setting that enable or disables this behavior entirely. We would still need a default, but this would be cleaner IMO.
I know that it is faster and can be by a magnitude, and I guess this wouldn't be a blocker, but I would like to be able to show by how much!

Victoronz commented 6 months ago

3./4./5. Most improvements to QuerySortedIter should also be applied to QueryManyIter and its future offshoot types. QuerySortedIter is basically a special case of the more general QueryManyIter.

iiYese commented 6 months ago

Taking a broader view, are there any other things that need caching on the Query level? To me this sounds like a problem that isn't unique to sorting queries. Would it make sense to make use of some shared infrastructure here?

With queries as entities queries can have row polymorphism like everything else that's at an ECS level. This means caching is no longer limited to either:

A monolith struct with redundant fields
A zoo of types

Query entities can have different components for whatever caches they need. Aside from sort caches they will need other cache structures as we add new features. Two things that come to mind:

Immanent: Entity relations (makes query caches more structural where as currently they're very falt)
Much further down the line when we catch up to flecs: Query planner (changes the order in which terms are evaluated for faster query construction)

I would not worry about queries as entities as they're one of the prerequisite/accompanying features for relations and the sort cache can be refactored when they land.

Victoronz commented 6 months ago

Another question I have about the final API example: Currently, a user calls iter, then sort, here this is reversed. When sort also lands for iter_many, how would this work? The user can no longer specify a subset of a query to sort. I.e. if a user wishes to sort 3 out of a 1000 entities, then they do not wish to pay for a full query sort in advance. Additionally, since the actual table is only sorted after the first run, the query.sort call would have to return a sorted Vec of query results to iterate over. Entity is not a simple index into this, and its matching item would instead have to be found in a more expensive manner.

iiYese commented 6 months ago

Currently, a user calls iter, then sort, here this is reversed. When sort also lands for iter_many, how would this work?

This will either remove the existing API or exist along side it.

The user can no longer specify a subset of a query to sort. I.e. if a user wishes to sort 3 out of a 1000 entities, then they do not wish to pay for a full query sort in advance.

This sounds like an esoteric misfeature. How would a user know exactly which subset of items they would want sorted in the first place? There are zero guarantees about iteration order.

Additionally, since the actual table is only sorted after the first run, the query.sort call would have to return a sorted Vec of query results to iterate over. Entity is not a simple index into this, and its matching item would instead have to be found in a more expensive manner.

There should be no single use vectors. QueryState/Query wherever the cache ends up gets 2 new fields:

sort_cache: Vec<(TableID, RangeInclusive<usize>)>
sort_cursor: (usize, usize)

If .seek is not used the sort_cursor is reset to (0, 0).

Victoronz commented 6 months ago

The user can no longer specify a subset of a query to sort. I.e. if a user wishes to sort 3 out of a 1000 entities, then they do not wish to pay for a full query sort in advance.

This sounds like an esoteric misfeature. How would a user know exactly which subset of items they would want sorted in the first place? There are zero guarantees about iteration order.

F.e. if we want to sort the children of an entity. The Children or some other component stores a list of entities, which we retrieve by iter_many, then sort. I do not find this esoteric at all. This would only be relevant if this is intended to replace the existing API.

iiYese commented 6 months ago

That's not a subset of entities in a query that's mapping an entity list. Which is at best niche because

Children already has sort functions which you can use instead to get persistent ordering that you don't sort every run.
Component hooks & observers will allow people to ensure any data structures with entities they need sorting for are always sorted.
Children & bevy_hierarchy will be replaced with relations which will not have entity lists. The only way to get sorting period then will be at a query level.

Adamkob12 commented 5 months ago

Essentially the first run of the system will not have dense iteration, it will enqueue the command to reorder tables & the second run will have dense iteration.

When should the command be executed? If it's executed right away and reorders the Table, then it's very likely that the components will be mutated before the second sort is needed - requiring another table reordering just in time for the sorted query.

The logical conclusion is that the reordering of the table must happen right before the system that requires the sort runs (or after the last mutable access, but its impossible to know when that would be).

After thinking about it, I think there are 3 options that allow for dense, sorted iteration: 1) Designate certain archetypes as "hot", meaning their tables would have to sort themselves after each mutation (only one sort per archetype) 2) Let the user call a command to sort archetypes manually (A bit of a footgun) 3) Reorder the table right at the .sort() call, and then iterate densely (need a special type of Query for that, because no one must access the table at the time of the sort)

iiYese commented 5 months ago

When should the command be executed

In the next sync point. So basically the first run would be without dense iteration, the second run will need to rebuild the sort cache again and every run after that will be dense. (Until you have a mutation which will start this whole process again)

mcatanzariti commented 1 month ago

Hi!

If indexes described here https://github.com/bevyengine/bevy/discussions/1205, are implemented at a point, they could also resolve the sorted queries feature. To achieve that, indexes should be implemented as BTrees, like SQL database indexes. They could implement 2 different goals:

Sorting queries
Filtering, Looking up by component properties

iiYese commented 1 month ago

@mcatanzariti I fail to see how indices are anything but orthogonal here. Specifically indices do not allow for dense cache friendly iteration. An indirection has to be resolved every time. The Ergonomics are also far worse from looking at the issue.

iiYese commented 1 month ago

@BenjaminBrienen Please give motivation when adding labels like Needs Design & Contentious to issues. This issue doesn't just describe a complete design of the feature for bevy but this feature is already implemented in another ECS so I'm removing that label. I'm not seeing how this is contentious either.

BenjaminBrienen commented 1 month ago

Ok. I gave it that status because it didn't have a status and it seemed like there was a lot of uncertainty in what the work should look like. I gave it the other label because the long discussion seemed like it fit the description of "has nontrivial implications". Please assign it a better status if you like.

iiYese commented 1 month ago

It's not really S-Blocked because it can be implemented without archetype level change detection & such but to see its full usefulness the other items mentioned also need to be implemented however I wouldn't proceed implementing this without those either. So no label suits. The discussion is asking about how things would be implemented/behave because this is just a complicated feature. Haven't been any major disagreements yet.

Victoronz commented 2 weeks ago

@BenjaminBrienen Please give motivation when adding labels like Needs Design & Contentious to issues. This issue doesn't just describe a complete design of the feature for bevy but this feature is already implemented in another ECS so I'm removing that label. I'm not seeing how this is contentious either.

I actually disagree. While this feature has been implemented in another ECS, I think Bevy can and should make its own considerations and decisions when implementing such a feature; so that it makes sense in both Bevy and Rust contexts. I think "X-Contentious" is applicable because there is potential for confusing/unintuitive behavior for users, and it is not immediately clear how to remedy this. In my opinion, "S-Needs-Design" means that it is not yet clear what design best captures the considerations we want to make. While the currently proposed design is one that could work, I do not think it has been decided that it is the best that there could be.

iiYese commented 2 weeks ago

I think Bevy can and should make its own considerations and decisions when implementing such a feature; so that it makes sense in both Bevy and Rust contexts.

If there is things you wish to add identify and describe them. Like I already have with the seek portion of the API which is not present in flecs. That is entirely new. There is nothing about languages that would make the implementation meaningfully different nor the ECS. Both bevy & flecs are archetypal ECSs. Bevy might store the caches differently right now because of how queries are implemented but that is not a meaningful difference. The things we're sorting & caching are the same.

I think "X-Contentious" is applicable because there is potential for confusing/unintuitive behavior for users, and it is not immediately clear how to remedy this.

Please identify and describe the issue.

In my opinion, "S-Needs-Design" means that it is not yet clear what design best captures the considerations we want to make. While the currently proposed design is one that could work, I do not think it has been decided that it is the best that there could be.

Please describe a design you think is better.

Victoronz commented 2 weeks ago

First, I would say that I think these labels to invite discussion, and to remove them I find akin to saying "we no longer need to think about these aspects". I'd prefer to foster discussion rather than shut it down!

I'll write up my technical points sometime soon.

iiYese commented 2 weeks ago

There's nothing stopping you from discussing an already complete design. The thing I don't invite is:

alluding to problems instead of identifying them
discussing imagined hypotheticals instead of identified use cases

These things create endless discussion for the sake of discussion & waste peoples energy with no productive influence on designs or implementations.

Victoronz commented 2 weeks ago

I already had some technical thoughts in mind when I initially stated my disagreement, but I did not identify them; exactly because I did not want to waste your and any readers time and wanted to get it/phrase it right first before reengaging! I see that I could have waited with re-adding the labels until I made those points.

For your second point, I do not quite understand what you are referring to with "imagined hypotheticals".

iiYese commented 2 weeks ago

I do not quite understand what you are referring to with "imagined hypotheticals"

An example of this would be "What if they want a custom sorting algorithm instead of the one we provide". This is a purely abstract feature with a person trying to accomplish something absent. These usually fail to examine if people would ever want to do this in the first place or if it's something that would be useful to do at all. When you do examine these they quickly fall apart. For example different sorting algorithms usually pair well with different data structures. Since the data structure of the ECS is fixed I would find this suggestion dubious.

This kind of strategy tries to accommodate uses cases by allowing for every use case imaginable & hoping the use cases that actually exist are covered.

Victoronz commented 2 weeks ago

I do not quite understand what you are referring to with "imagined hypotheticals"

An example of this would be "What if they want a custom sorting algorithm instead of the one we provide". This is a purely abstract feature with a person trying to accomplish something absent. These usually fail to examine if people would ever want to do this in the first place or if it's something that would be useful to do at all. When you do examine these they quickly fall apart. For example different sorting algorithms usually pair well with different data structures. Since the data structure of the ECS is fixed I would find this suggestion dubious.

This kind of strategy tries to accommodate uses cases by allowing for every use case imaginable & hoping the use cases that actually exist are covered.

I don't feel this example illustrates your point well, because different sorting algorithms (as in glidesort, pdqsort, radixsort, etc.) can fit different types of data and patterns in the order of that data, and less what data structure this data is from. As long as the user decides what data to store, and a slice can be formed with that data, then passing a custom sort algorithm as a function on top of a cmp function could make sense.

That is separate from these questions: Can this be supported? Should this be supported? If so, should this be done alongside what we are currently trying to accomplish?

The above is debatable, and the value/usefulness of some ideas/the very discussion of them can be rather subjective. However, I think a person hearing that their suggestion "is a purely abstract feature [...] trying to accomplish something absent" would likely feel frustrated by that.

That being said, we are getting off-topic. While we may disagree, I don't think anyone is trying to create a bad end result!

iiYese commented 2 weeks ago

I don't feel this example illustrates your point well

I feel you've demonstrated my point even better. You are humoring an idea without users in mind solely for the sake of even more API coverage for what is an even more niche scenario.

I don't think anyone is trying to create a bad end result!

That's not the issue here. These kinds of discussions try to create perfect results first time in one cycle which is not a sustainable way to develop.

Victoronz commented 2 weeks ago

I don't feel this example illustrates your point well

I feel you've demonstrated my point even better. You are humoring an idea without users in mind solely for the sake of even more API coverage for what is an even more niche scenario.

We are talking past each other. By saying "it could make sense", I do not mean "we should do this!" I am saying that I respect that it could be useful, however to actually include it depends on whether it fits with what Bevy is trying to do. I do not know what someone has in mind when they make a suggestion, but they might be accounting for things I am not currently aware of.

In this example, my reasoning is: In Rust we sort slices with std API. This is what the vast majority of people do. There are crates that offer different sorting algorithms, used for slices, instead. These crates see use. Can somebody express cached query sorting with a custom algorithm in user land? Not easily, if at all. Thus, there are some people that would be happy about being able to use API for the above. Do we have to make those users happy? As of right now, I would say probably not, but I am not the one who decides.

I don't think anyone is trying to create a bad end result!

That's not the issue here. These kinds of discussions try to create perfect results first time in one cycle which is not a sustainable way to develop.

I'm not saying we have to create perfect results the first time, I am trying to say that deciding not to do/account for something is also part of design.

I do not wish to continue this tangent. It seems we have different philosophies and are not likely to agree. Let us return to technical conversation.

Victoronz commented 2 weeks ago

On to questions about what this design intends to do:

I. Is it intended to only consist of one sort function, that only takes Fn(FilteredEntityRef, FilteredEntityRef) -> Ordering?

II. Since you have mentioned "sorting algorithm we provide", would sorting a table involve a different form of sorting than sorting a slice, like how one usually would? (I am aware that a table is more than slice, I am asking whether the "sort" at the core of the "table sort" operation would be on a slice)

III. Does this approach have a way of mitigating conflicts between sorts? Especially since multiple archetypes might be affected by sorting one table. You've stated that the first run of the system would not have dense iteration, and queue a command so we can iterate densely after. If two systems sort queries with intersecting (or the same) archetypes, they might fall into a cycle of not achieving dense iteration, since both commands will each undo the sort from the last, wasting work.

IV. Similarly, does it account for systems wishing to perform a sort that has already been done? If an archetype has not been modified since the last time a system queued a command, it can early out of resorting thanks to change detection. However how about when a table has been modified, but in a way that preserves order? (i.e. entity removal) Further, can it take advantage of multiple systems wanting an equivalent sort?

V. How would this design think of query sorts in terms of "fallibility". When given an "incorrect" cmp function/Ord implementation, the behavior of std library sorts is unspecified (= logic error, not undefined behavior), it may panic, silently return incorrect results, or not finish in a realistic amount of time.

VI. You have stated in #16203:

The public user facing APIs for the cached variants will only allow fn pointers because closures can be stateful & alter how they sort.
The cache would have no way of detecting if this state is changed hence they're not allowed.

A closure being stateful does not guarantee non-determinism. To have a function/non-capturing closure does not guarantee determinism. There is no way to guarantee determinism with function types/traits unless you know their implementation beforehand, how does the cache here deal with this? (This ties in with question V. above)

VII. Are non-archetypal filters compatible with this? Can they interfere with caching? (By evaluating differently without any mutation happening?)

VIII. What happens when a query created mid-system (transmute_lens, join, etc) is sorted?

On the seek portion:

IX. Is this separate from, potentially in conflict with, or trying to model what the Seek trait in std does? (This trait is used to provide a movable cursor within a stream of bytes)

iiYese commented 2 weeks ago

I. Yes

II. Doesn't matter & is irrelevant. Any further optimization or customization should be done in a follow up.

III. That's the whole point of opting out of table sorting & using stable sorts.

IV. This is well beyond design and into implementation details now. Change detection cannot report this. Archetype level change detection is needed. Either way the low..high vec needs to be updated.

V. That is user error.

VI. That is the best we can do. Do not over engineer. Anything else is user error.

VII. No.

VIII. Transmutes are not new queries & neither are joins. Joins that iterate over already sorted queries iterate a subset of the matches. Meaning they're still sorted.

IX. There is no conflict. If anything it'll be an analogy for users. They're both doing a similar operation. "Seek"ing isn't unique to rust std.

Please use numbers we're not Roamns.

bevyengine / bevy

High performance sorted queries #13464

What problem does this solve or what need does it fill?

13417 adds sorting at an iterator level. These sorts:

What solution would you like?

What alternative(s) have you considered?

Additional context