"Instant" command processing

alice-i-cecile commented 3 years ago

What problem does this solve or what need does it fill?

Having to wait until a hard sync point (currently the end of the Stage) for basic commands to process slows down prototyping badly. This is particularly frustrating with

spawning entities
despawning entities
adding components
removing components

Without this feature, you have to be very thoughtful about how and when you're using commands, even when prototyping, and where you place systems that rely on other things being mutated via commands. Failing to account for command delays properly can result in frame lags or buggy behavior, and moving systems between stages to accommodate the required hard syncs takes a lot of work, which must be refactored whenever your prototype changes significantly.

Possible Solutions

This is a very technically complex feature request, and as such there are several potential solutions that deserve more exploration. The gist of the challenge is that commands can, in general, touch arbitrary data, making them very hard to schedule. To get around that, regular hard sync points exist, which have exclusive access to the entire world and can perform arbitrary operations to process any commands that may have built up.

Bit-flipping: see #356. Iterators have to be aware that they can skip components in tables due to change detection, so we can use this property to toggle a bit to "soft-remove" an entity from an archetype without mutable access to the entire World (and possibly do something similar to add it to a new one). We can use hard syncs to clean up the archetypes, and recover data locality / future performance. From @BoxyUwU.
Commands subworlds: Copy the data into a subworld when using commands, queries must access the main world and all subworlds. From Kae on Discord.
Dynamically scheduled commands: fragment commands based on the data that they need to access. Dynamically schedule them at a point between they are created and the first time that data would be used. Remove most hard syncs in the process. See this comment on #1375.

Davier commented 3 years ago

I like the idea of having a "staging" world that commands insert into. As discussed in https://github.com/bevyengine/bevy/issues/1446, it would be useful to spawn loaded assets (especially scenes) and modify them on the fly.

Ratysz commented 3 years ago

"It makes prototyping easier by letting you think about X less" does not sound like compelling motivation to me for this case. Maybe I'm picking up more from phrasing than there is.

I fairly sure we already had this conversation on why multiple worlds is not a good general solution for this. Some form of slowboat staging world could be a thing, but it doesn't have to be powered by commands; copying out parts of the world for things like rendering is a different matter entirely.

Fragmenting commands could work, but it is an entire new layer of constraints on the systems graph - automatically supporting that for all systems is bound to harm parallelization; it should be opt-in, and it should be immediately obvious which commands can be "instant", perhaps to the point of having separate dedicated queues. This also doesn't have to be dynamic: the system must declare what commands it may issue, exactly the same way it must declare what data it accesses or if it cares about execution order, so "scheduling commands" of a system can be done as part of scheduling the system itself.

It bears mentioning that no solution will be able to make all commands instant, only a subset.

cart commented 3 years ago

Something to consider: we can allow "immediate" insertions and removals if we can guarantee that no other system is reading or writing a given storage (Table, SparseSet, Archetypes, Entities).

We could add a new api that enables users to claim these resources uniquely.

Spawning/despawning in a specific Archetype requires unique write access to the following:

Archetype value
- Cannot be run concurrently with &Archetypes queries (requires new logic) or other operations that need this Archetype value (which can be guaranteed if all ArchetypeComponents are grabbed uniquely and that is globally treated as a unique Archetype borrow)
Table of archetype
- Claiming Write access to each ArchetypeComponent in the archetype prevents concurrent reads/writes to components
Sparse Sets in archetype
- Claiming Write access to each sparse Component in the archetype prevents concurrent reads/writes (this could be translated to all current ArchetypeComponents for a given Component to avoid complicating the scheduler)
Entities
- This is the "hard" part. Grabbing this uniquely prevents all other spawn/despawn actions, in addition to metadata lookups for things like Query::get_component::<T>(entity) and Query::get(entity). This wouldn't block Query iteration as it doesn't access the metadata collection. This would almost certainly require some kind of synchronization.

Meeting the constraints above should enable things like:

// Claims an _exact_ archetype (using the logic above) and nothing else
// (and initializes the archetype if it doesn't already exist)
fn sys(mut writer: ArchetypeWriter<(A, B, C)>) { 
  writer.spawn((A(0), B(0), C(0)));
  writer.despawn(some_entity); // fails if entity is not in archetype
}

// Claims _all_ archetypes that currently match the query  (using the logic above)
// no new archetypes are allowed (to prevent the need to lock Archetypes uniquely and
// because we can't anticipate what component types are needed ahead of time)
fn sys(mut writer: QueryWriter<(A, B, C)>) {
  writer.spawn((A(0), B(0), C(0)));
  writer.spawn((A(1), B(1), C(1), D(1)); // note that D's storage was implicitly claimed by QueryWriter
  writer.despawn(some_entity);
}

// Claims the before/after archetype (using the logic above) and nothing else
// (and initializes the archetypes if they doesn't already exist)
fn sys(mut remover: Remover<(A, B, C),  B>) { 
  remover.remove(entity);
}

We should also consider just making it possible to add exclusive systems (aka hard sync points) before/after any system in a stage. Solving that problem allows for simpler, less restrictive world access.

alice-i-cecile commented 3 years ago

We should also consider just making it possible to add exclusive systems (aka hard sync points) before/after any system in a stage. Solving that problem allows for simpler, less restrictive world access.

This should be vastly improved by a stageless design. Following up on it though, I would much rather see users able to add parallel systems as exclusive systems that induce a hard sync point, whose commands are immediately processed. This allows them to stick to the standard type signatures, and then convert them back into ordinary parallel systems quickly once they're done prototyping.

e.g. add_system(my_system) becomes add_exclusive_system(my_system) and vice versa, quickly letting you replace them without needing to completely rewrite things in terms of World like you do now. Using the .system() and .exclusive_system() methods would also work for now, but ideally that boilerplate will be able to be removed Soon:tm:.

alice-i-cecile commented 3 years ago

Something to consider: we can allow "immediate" insertions and removals if we can guarantee that no other system is reading or writing a given storage (Table, SparseSet, Archetypes, Entities).

This idea feels like... partially exclusive systems to me? I like them a lot, but for the same reasons as my last comment, I feel like they should be defined upon system insertion, rather than forcing a modification of the parameter types in your system.

alice-i-cecile commented 3 years ago

Overall, I think that the "run this parallel system as an exclusive system", inserted into arbitrary points in the schedule is the most promising approach to try out first because:

It dovetails very nicely with the stageless work we want to do already
As far as I can see, it introduces no new serious technical challenges
It enables people to write exclusive systems with more comfortable syntax, rather than learning to need how to grab the data they need from the world directly
It's very easy to convert systems to-and-from this form, allowing for simple prototyping
It won't impose any overhead on optimized games that only use carefully scheduled parallel systems

Ratysz commented 3 years ago

Any system can already be used as an exclusive system, you just need to insert it with .exclusive_system(). Naturally, that comes with all the restrictions, main one being that it's inserted into the spots where exclusive systems go. It's not possible to insert such a coerced exclusive system into arbitrary points in the schedule for the same reasons as with regular exclusive systems.

Stageless will indeed let us make nicer APIs for this: specifying hard sync points without the additional boilerplate of stages is what stageless is all about, in the end. I'm working on it, but it'll need more thinking time; multiple issues seemingly need to be solved at the same time and I would like to avoid doing that with a ground-up rewrite.

None of that is relevant to the issue at hand, though. Commands are intended to enable parallel systems to perform generally non-parallelizable operations while remaining parallel; making some of the systems exclusive defeats the purpose. Calling systems issuing disjoint commands "partially exclusive" is confusing.

alice-i-cecile commented 3 years ago

Here's a really clear use case for this functionality, with no great workarounds.

AlphaModder commented 3 years ago

I've got another usecase for this functionality, indirectly. I've been discussing how to do rendering with @mtsr and a few others on discord recently, and we're looking at an architecture that classes systems into one of two kinds: simulation and extraction.

Simulation systems, well, simulate the world, while extraction systems copy data into renderer data structures. The details of this process are not important to this issue, but what is important is that there is what I've been calling a 'soft sync point' between simulation and extraction. That is, any system marked extraction needs to run after all systems labeled simulation that touch the same data have finished their work. This is different from the hard sync point introduced by a stage, because it's not the case that every extraction system must run after all simulation systems, and in fact for performance reasons we want an extraction system to run as soon as possible once the data it needs is available.

In slightly more mathematical language, a hard sync point between stage A and stage B expresses the constraint forall a, b : System. ((stage(a) = A and stage(b) = B) implies a <= b) on the system ordering poset. But instead, we want the constraint forall a, b : System. ((label(a) = Simulation and label(b) = Extraction) implies not(a >= b)). This is not equivalent, since it permits Simulation and Extraction systems to be unordered with respect to each other. Only if system dependencies prevent parallelization (i.e. a writes to a resource b reads) does the constraint come into play.

If the only way for systems to touch the World was through queries, it would be trivial (I think) to implement this kind of constraint. However, the problem is commands. It is critical that if a simulation system runs commands that touch render data, those commands are processed by the time an extraction system reads that data. But right now, commands can only be processed during the hard sync point between two stages, which completely eliminates the possibility of running any extraction systems before all simulation systems have completed. So getting optimal performance out of this rendering model would require some kind of finer-grained command scheduling.

At the time I ran into this problem, I wasn't aware this issue existed, so I tried to come up with my own idea for command scheduling. Rather than abandoning commands entirely, or providing instant execution, my aim was to preserve most of the way commands work now, but to automatically schedule execution of commands per-system in a granular fashion, rather than at manually specified points. To be honest, I'm not convinced what I came up with is preferable to what @cart described above, and it's got a bit of a mad science flavor to it, but I was asked to add it to this issue on discord, so I'll try to sketch it out under the cut.

On-demand command scheduling explanation

The approach is based on two core ideas: - A system should not make changes to entities it did not either obtain from a query or create itself. - A system must declare in its signature what components it could possibly add or remove from entities it accesses. From an API perspective, here's how I would implement this. First, create a borrowed type representing the ID of an entity a system may make changes to: ```rust pub struct LiveEntity<'w>(Entity, PhantomData<&'w ()>); // I think this is the right variance... impl<'w> From> for Entity { fn from(live: LiveEntity<'w>) -> Entity { live.0 } } ``` Then, change `>::Item` from `Entity` to `LiveEntity<'w>`. Likewise, alter the signatures of `Commands::current_entity` and `Commands::set_current_entity` to use `LiveEntity` instead of `Entity`. For each possible operation a command can apply to an entity, we make the method on `Commands` require a `LiveEntity` and an instance of a `Copy` ZST marker that can be obtained as a `SystemParam`. This type should include the types of components, thus making the components a system can add or remove part of its signature. For example, for `insert_one`, it would look like this: ```rust pub struct FetchInsert(PhantomData); #[derive(Copy, Clone)] pub struct Insert<'a, T>(PhantomData<&'a T>); impl<'a, T> SystemParam for Insert<'a, T> { type Fetch = FetchInsert<'a, T>; } impl<'a, T> FetchSystemParam<'a> for FetchInsert { type Item = Insert<'a, T>; fn init(system_state: &mut SystemState, world: &World, resources: &mut Resources) { } unsafe fn get_param(system_state: &'a SystemState, world: &'a World, resources: &'a Resources) -> Option { Some(Insert(PhantomData)) } } impl Commands { fn insert_one(&mut self, entity: LiveEntity, insert: Insert, component: T) { ... } // relying on lifetime elision here, might not work } ``` There would be an analogous interface for `remove_one` and friends, and `spawn`ing an entity with a component would probably just require the same marker type as an insert. `despawn` would need its own marker `Despawn`, I think. There is no way to support the `Commands::add_command` API in this approach, since it takes exclusive access to the world as a given. With these APIs, I believe it should be possible to fully describe which archetypes a system could access with a struct like this: ```rust struct SystemAccess { any: DNF, none: DNF, add: Vec, remove: Vec, can_despawn: bool, } ``` Here, `DNF` refers to a boolean expression in disjunctive normal form, with variables represented by the type `T`. Conceptually, there is a simple function that checks if an archetype could contain entities live in a system: ```rust fn is_archetype_live(system: &SystemAccess, archetype: &Archetype) -> bool { archetype.matches(system.any) && !archetype.matches(system.none) } ``` And still conceptually, there is a more complicated function to determine if a system could generate commands touching a given archetype: ```rust fn is_archetype_modified(system: &SystemAccess, archetype: &Archetype) -> bool { if archetype == Archetype::Empty { return false; // special case, since the empty archetype never actually needs to reallocate storage } let is_live = is_archetype_live(system, archetype); let can_move_out = is_live && (system.can_despawn || system.add.is_empty() || archetype.matches(system.remove)); let can_move_in = match is_live { true => archetype.matches(system.add), false => archetype.matches(system.any - system.remove) // This subtraction gives the DNF generated by removing any component ids in remove from each clause in any. } can_move_out || can_move_in } ``` While I have chosen to express it in code, the curried function `is_archetype_modified(system, _)` is just a boolean expression with one variable for each component type. An archetype then corresponds to an assignment of these variables. Thus, for two systems `s1` and `s2` with a soft sync point between them, we can construct the expression `is_archetype_modified(s1, archetype) && is_archetype_live(s2, archetype)` and use our favorite SAT solver to determine whether to flush `s1`'s commands before running `s2`. SAT requires worst-case time exponential in the number of variables, but we only need to do this once, and the number of variables involved would only be the sum of the number of components mentioned by each system's `SystemAccess`, so it's possible that effiency would not be a big problem. Once it has been determined what systems require `s1`'s commands to be flushed before running, we can insert a 'system' with the appropriate exclusive archetype access into the schedule, have it perform the flush, and make the dependent systems depend on it, rather than `s1` directly. And thus, we have command execution with soft sync points. As a disclaimer, I am actually not that familiar with the specific implementation details of the Bevy ECS, just the implementation of ECSs in general, so it's possible I have missed something critical in this proposal. Feel free to point out any problems.

mtsr commented 3 years ago

SAT requires worst-case time exponential in the number of variables, but we only need to do this once, and the number of variables involved would only be the sum of the number of components mentioned by each system's SystemAccess, so it's possible that effiency would not be a big problem. Once it has been determined what systems require s1's commands to be flushed before running, we can insert a 'system' with the appropriate exclusive archetype access into the schedule, have it perform the flush, and make the dependent systems depend on it, rather than s1 directly. And thus, we have command execution with soft sync points.

If it's doable to achieve this without high constants in the implementation and the number of components isn't prohibitive, soft sync points could give significant performance benefits, since CPU occupancy should better than with one or more hard sync points. Combine this with pipelining and the wins are even bigger, for rendering, but even simulation of frame N+1 could start while frame N is still winding down.

It would be worthwhile exploring the actual cost of computing scheduling constraints. I think some publicized ECS based games have had rather high numbers of components types. Will something like this effictively put limits on that? Maybe there could be a way to do (partial) precomputation at build time? But what about dynamic systems and components? Could these include their own (partially) precomputed constraints that can be completed on plugging in?

Edit: Actually, having this seems like it would make pipelining much easier to implement.

AlphaModder commented 3 years ago

I think some publicized ECS based games have had rather high numbers of components types. Will something like this effictively put limits on that?

It's worth pointing out that although an archetype conceptually represents a variable assignment in the number of component types throughout the entire application, the proposition is_archetype_modified(s1, archetype) && is_archetype_live(s2, archetype) will only ever depend on types mentioned in one of the system's SystemAccess fields. So there should not be any particular limit on the total number of component types. I'd have to benchmark it, but since the vast majority of systems operate on only a few component types, even a brute force O(2^n) algorithm not be unacceptably slow.

alice-i-cecile commented 3 years ago

In conversation with Cart, I proposed the following solution for instant component addition / removal, which seems promising:

Make component addition / removal a method on Query, which restricts you to altering m
Add a heightened-locking component wrapper, Modify, which allows you to add / remove components of that type. This gives our schedule the information it needs, without constantly incurring a locking cost.

The final API would look like:

fn toggle_system(query: Query<Modify<Toggled>, With<ToBeToggled>>){
  for toggle in query.iter_mut(){
     // Modify wrapper would work like Option<T> queries and return an Option
     match maybe_toggled {
      // No type is needed, since we have access to it from the `Modify` wrapper
       Some(_) => maybe_toggled.remove(),
       // We are only allowed to insert components with the `Toggled` type here
       None => maybe_toggled.insert(Toggled)
     }
  }
}

The existing commands API would have to remain intact, since you may want to insert / remove components of arbitrary types.

You'd also want to extend this to both bundles (easy) and spawning / despawning (pretty hard).

alice-i-cecile commented 2 years ago

ItsDoot commented 11 months ago

Would this be mostly solved by #9822, which makes it easier to understand when commands are applied?

alice-i-cecile commented 11 months ago

Yep, this plus manual apply_deferred will effectively solve the need here.

james7132 commented 8 months ago

With auto-insert sync points in #9822 and the followup PRs, this seems to be largely transparent to most users now, where the only concern being the performance impact of a sync point. Is this still desirable to have as an ECS feature?

alice-i-cecile commented 8 months ago

Yep, I'm content with the state of things now.

bevyengine / bevy

"Instant" command processing #1613

What problem does this solve or what need does it fill?

Possible Solutions