Closed alice-i-cecile closed 8 months ago
I like the idea of having a "staging" world that commands insert into. As discussed in https://github.com/bevyengine/bevy/issues/1446, it would be useful to spawn loaded assets (especially scenes) and modify them on the fly.
"It makes prototyping easier by letting you think about X less" does not sound like compelling motivation to me for this case. Maybe I'm picking up more from phrasing than there is.
I fairly sure we already had this conversation on why multiple worlds is not a good general solution for this. Some form of slowboat staging world could be a thing, but it doesn't have to be powered by commands; copying out parts of the world for things like rendering is a different matter entirely.
Fragmenting commands could work, but it is an entire new layer of constraints on the systems graph - automatically supporting that for all systems is bound to harm parallelization; it should be opt-in, and it should be immediately obvious which commands can be "instant", perhaps to the point of having separate dedicated queues. This also doesn't have to be dynamic: the system must declare what commands it may issue, exactly the same way it must declare what data it accesses or if it cares about execution order, so "scheduling commands" of a system can be done as part of scheduling the system itself.
It bears mentioning that no solution will be able to make all commands instant, only a subset.
Something to consider: we can allow "immediate" insertions and removals if we can guarantee that no other system is reading or writing a given storage (Table, SparseSet, Archetypes, Entities).
We could add a new api that enables users to claim these resources uniquely.
Spawning/despawning in a specific Archetype requires unique write access to the following:
Archetype
value
&Archetypes
queries (requires new logic) or other operations that need this Archetype value (which can be guaranteed if all ArchetypeComponents are grabbed uniquely and that is globally treated as a unique Archetype borrow)Table
of archetype
Sparse Sets
in archetype
Entities
Query::get_component::<T>(entity)
and Query::get(entity)
. This wouldn't block Query iteration as it doesn't access the metadata collection. This would almost certainly require some kind of synchronization.Meeting the constraints above should enable things like:
// Claims an _exact_ archetype (using the logic above) and nothing else
// (and initializes the archetype if it doesn't already exist)
fn sys(mut writer: ArchetypeWriter<(A, B, C)>) {
writer.spawn((A(0), B(0), C(0)));
writer.despawn(some_entity); // fails if entity is not in archetype
}
// Claims _all_ archetypes that currently match the query (using the logic above)
// no new archetypes are allowed (to prevent the need to lock Archetypes uniquely and
// because we can't anticipate what component types are needed ahead of time)
fn sys(mut writer: QueryWriter<(A, B, C)>) {
writer.spawn((A(0), B(0), C(0)));
writer.spawn((A(1), B(1), C(1), D(1)); // note that D's storage was implicitly claimed by QueryWriter
writer.despawn(some_entity);
}
// Claims the before/after archetype (using the logic above) and nothing else
// (and initializes the archetypes if they doesn't already exist)
fn sys(mut remover: Remover<(A, B, C), B>) {
remover.remove(entity);
}
We should also consider just making it possible to add exclusive systems (aka hard sync points) before/after any system in a stage. Solving that problem allows for simpler, less restrictive world access.
We should also consider just making it possible to add exclusive systems (aka hard sync points) before/after any system in a stage. Solving that problem allows for simpler, less restrictive world access.
This should be vastly improved by a stageless design. Following up on it though, I would much rather see users able to add parallel systems as exclusive systems that induce a hard sync point, whose commands are immediately processed. This allows them to stick to the standard type signatures, and then convert them back into ordinary parallel systems quickly once they're done prototyping.
e.g. add_system(my_system)
becomes add_exclusive_system(my_system)
and vice versa, quickly letting you replace them without needing to completely rewrite things in terms of World
like you do now. Using the .system()
and .exclusive_system()
methods would also work for now, but ideally that boilerplate will be able to be removed Soon:tm:.
Something to consider: we can allow "immediate" insertions and removals if we can guarantee that no other system is reading or writing a given storage (Table, SparseSet, Archetypes, Entities).
This idea feels like... partially exclusive systems to me? I like them a lot, but for the same reasons as my last comment, I feel like they should be defined upon system insertion, rather than forcing a modification of the parameter types in your system.
Overall, I think that the "run this parallel system as an exclusive system", inserted into arbitrary points in the schedule is the most promising approach to try out first because:
Any system can already be used as an exclusive system, you just need to insert it with .exclusive_system()
. Naturally, that comes with all the restrictions, main one being that it's inserted into the spots where exclusive systems go. It's not possible to insert such a coerced exclusive system into arbitrary points in the schedule for the same reasons as with regular exclusive systems.
Stageless will indeed let us make nicer APIs for this: specifying hard sync points without the additional boilerplate of stages is what stageless is all about, in the end. I'm working on it, but it'll need more thinking time; multiple issues seemingly need to be solved at the same time and I would like to avoid doing that with a ground-up rewrite.
None of that is relevant to the issue at hand, though. Commands are intended to enable parallel systems to perform generally non-parallelizable operations while remaining parallel; making some of the systems exclusive defeats the purpose. Calling systems issuing disjoint commands "partially exclusive" is confusing.
Here's a really clear use case for this functionality, with no great workarounds.
I've got another usecase for this functionality, indirectly. I've been discussing how to do rendering with @mtsr and a few others on discord recently, and we're looking at an architecture that classes systems into one of two kinds: simulation and extraction.
Simulation systems, well, simulate the world, while extraction systems copy data into renderer data structures. The details of this process are not important to this issue, but what is important is that there is what I've been calling a 'soft sync point' between simulation and extraction. That is, any system marked extraction needs to run after all systems labeled simulation that touch the same data have finished their work. This is different from the hard sync point introduced by a stage, because it's not the case that every extraction system must run after all simulation systems, and in fact for performance reasons we want an extraction system to run as soon as possible once the data it needs is available.
In slightly more mathematical language, a hard sync point between stage A
and stage B
expresses the constraint forall a, b : System. ((stage(a) = A and stage(b) = B) implies a <= b)
on the system ordering poset. But instead, we want the constraint forall a, b : System. ((label(a) = Simulation and label(b) = Extraction) implies not(a >= b))
. This is not equivalent, since it permits Simulation and Extraction systems to be unordered with respect to each other. Only if system dependencies prevent parallelization (i.e. a
writes to a resource b
reads) does the constraint come into play.
If the only way for systems to touch the World
was through queries, it would be trivial (I think) to implement this kind of constraint. However, the problem is commands. It is critical that if a simulation system runs commands that touch render data, those commands are processed by the time an extraction system reads that data. But right now, commands can only be processed during the hard sync point between two stages, which completely eliminates the possibility of running any extraction systems before all simulation systems have completed. So getting optimal performance out of this rendering model would require some kind of finer-grained command scheduling.
At the time I ran into this problem, I wasn't aware this issue existed, so I tried to come up with my own idea for command scheduling. Rather than abandoning commands entirely, or providing instant execution, my aim was to preserve most of the way commands work now, but to automatically schedule execution of commands per-system in a granular fashion, rather than at manually specified points. To be honest, I'm not convinced what I came up with is preferable to what @cart described above, and it's got a bit of a mad science flavor to it, but I was asked to add it to this issue on discord, so I'll try to sketch it out under the cut.
SAT requires worst-case time exponential in the number of variables, but we only need to do this once, and the number of variables involved would only be the sum of the number of components mentioned by each system's
SystemAccess
, so it's possible that effiency would not be a big problem. Once it has been determined what systems requires1
's commands to be flushed before running, we can insert a 'system' with the appropriate exclusive archetype access into the schedule, have it perform the flush, and make the dependent systems depend on it, rather thans1
directly. And thus, we have command execution with soft sync points.
If it's doable to achieve this without high constants in the implementation and the number of components isn't prohibitive, soft sync points could give significant performance benefits, since CPU occupancy should better than with one or more hard sync points. Combine this with pipelining and the wins are even bigger, for rendering, but even simulation of frame N+1 could start while frame N is still winding down.
It would be worthwhile exploring the actual cost of computing scheduling constraints. I think some publicized ECS based games have had rather high numbers of components types. Will something like this effictively put limits on that? Maybe there could be a way to do (partial) precomputation at build time? But what about dynamic systems and components? Could these include their own (partially) precomputed constraints that can be completed on plugging in?
Edit: Actually, having this seems like it would make pipelining much easier to implement.
I think some publicized ECS based games have had rather high numbers of components types. Will something like this effictively put limits on that?
It's worth pointing out that although an archetype conceptually represents a variable assignment in the number of component types throughout the entire application, the proposition is_archetype_modified(s1, archetype) && is_archetype_live(s2, archetype)
will only ever depend on types mentioned in one of the system's SystemAccess
fields. So there should not be any particular limit on the total number of component types. I'd have to benchmark it, but since the vast majority of systems operate on only a few component types, even a brute force O(2^n)
algorithm not be unacceptably slow.
In conversation with Cart, I proposed the following solution for instant component addition / removal, which seems promising:
Query
, which restricts you to altering m Modify
, which allows you to add / remove components of that type. This gives our schedule the information it needs, without constantly incurring a locking cost.The final API would look like:
fn toggle_system(query: Query<Modify<Toggled>, With<ToBeToggled>>){
for toggle in query.iter_mut(){
// Modify wrapper would work like Option<T> queries and return an Option
match maybe_toggled {
// No type is needed, since we have access to it from the `Modify` wrapper
Some(_) => maybe_toggled.remove(),
// We are only allowed to insert components with the `Toggled` type here
None => maybe_toggled.insert(Toggled)
}
}
}
The existing commands API would have to remain intact, since you may want to insert / remove components of arbitrary types.
You'd also want to extend this to both bundles (easy) and spawning / despawning (pretty hard).
More related work: https://github.com/bevyengine/rfcs/pull/57, https://github.com/bevyengine/rfcs/pull/30
Would this be mostly solved by #9822, which makes it easier to understand when commands are applied?
Yep, this plus manual apply_deferred will effectively solve the need here.
With auto-insert sync points in #9822 and the followup PRs, this seems to be largely transparent to most users now, where the only concern being the performance impact of a sync point. Is this still desirable to have as an ECS feature?
Yep, I'm content with the state of things now.
What problem does this solve or what need does it fill?
Having to wait until a hard sync point (currently the end of the Stage) for basic commands to process slows down prototyping badly. This is particularly frustrating with
Without this feature, you have to be very thoughtful about how and when you're using commands, even when prototyping, and where you place systems that rely on other things being mutated via commands. Failing to account for command delays properly can result in frame lags or buggy behavior, and moving systems between stages to accommodate the required hard syncs takes a lot of work, which must be refactored whenever your prototype changes significantly.
Possible Solutions
This is a very technically complex feature request, and as such there are several potential solutions that deserve more exploration. The gist of the challenge is that commands can, in general, touch arbitrary data, making them very hard to schedule. To get around that, regular hard sync points exist, which have exclusive access to the entire world and can perform arbitrary operations to process any commands that may have built up.
World
(and possibly do something similar to add it to a new one). We can use hard syncs to clean up the archetypes, and recover data locality / future performance. From @BoxyUwU.