bevyengine / bevy

A refreshingly simple data-driven game engine built in Rust
https://bevyengine.org
Apache License 2.0
36.14k stars 3.56k forks source link

Systems should be skipped if their resources cannot be fetched #15265

Closed alice-i-cecile closed 1 month ago

alice-i-cecile commented 1 month ago
          > The solution to this I would prefer is to lean into the ECS paradigm more & treat more things as if they were querying the world. If a Res<Foo> doesn't exist the system just doesn't run. Similar to how iterating over a Query<Bar> that matches nothing loops over nothing. If users want a more explicit panicky behavior they can use Option<Res<Foo>> & unwrap.

I do think we should do this, and if we can get this behavior in place I'd prefer to encourage the use of the Single system param from #15264. This would go a long way to addressing the problem and ensuring non-panicking behavior by default without an ergonomics hit.

Originally posted by @iiYese and @alice-i-cecile in https://github.com/bevyengine/bevy/issues/14275#issuecomment-2356344841

alice-i-cecile commented 1 month ago

I really quite like this idea: I think it's generally a pretty robust solution. The design challenge here is:

a) how do we provide warnings to users that their system isn't running because they forgot to initialize a resource or whatever b) how do we easily silence that warning for cases where this behavior is expected

Rigidity commented 1 month ago

+1 for this, this has bitten me before

I guess a warning could tell you to explicitly use Option or Single if it's missing?

alice-i-cecile commented 1 month ago

Discord notes from a conversation on 2024-09-17.

SystemParam::get_param can mutate SystemParam::State, so it'd need to be split into try_get (immutable state) and apply_get (mutable, post access)

Semantically there's not much difference between SystemParam and WorldQuery, they both grab pointers from the world, cache state etc. IMO reaching convergence there so queries and systems are one and the same would be nice, but is so much code churn that I don't know if it's realistic at this point

@james-j-obrien

jnhyatt commented 1 month ago

What about logging a warning on first run (or not-run?) that says more or less "system did not run because XXX resource was not present in the world. To suppress this warning, either insert the resource or add .run_if(resource_exists::<XXX>) to the system"

alice-i-cecile commented 1 month ago

Yeah, I'm happy with that as an initial solution.

iiYese commented 1 month ago

I'd prefer for this to be an expected default that designs take into consideration cause it's more consistent with the behaviors of ECS. A diagnostic warning about missing resources is mostly going to be for transitioning from the old behavior.

Is there a mechanism to have custom lints? If there is we could have one for all instances of Res & ResMut in 0.16 that people can suppress with #![allow(..)] that just reads:

"Systems will not run when resources can't be fetched from 0.16 onward. Use Option<Res<_>> or Option<ResMut<_>> to change this".

MiniaczQ commented 1 month ago

Another implication of this is that System::run_unsafe -> Self::Out can sometimes not run, so we need to wrap the return value in option, probably rename the function to try_run_unsafe as well. Returning None will invalidate all chaining/piping/etc.

Unless we decide to do param validation before run_unsafe

iiYese commented 1 month ago

We could also add an associated const on the resource trait

pub trait Resource: Send + Sync + 'static {
    const NOISY: bool = false;
}

With a helper attribute for the derive that sets this to true. When checking if a system runs where there is a missing resource that has this set to true it will cause a panic instead of the system not running.

MiniaczQ commented 1 month ago

Should SystemParam::validate_param run before or inside System::run_unsafe? The latter requires us to return Option<System::Out> instead of System::Out.

I'd rather it be before instead of inside to avoid unnecessary wrapping in the cases where the validation can (potentially) be elided.

From @james7132 in discord

alice-i-cecile commented 1 month ago

I agree with @james7132 there: validation should be distinct from running the system, and be done by schedulers.

@iiYese for custom lints, we'd need https://github.com/theBevyFlock/bevy_cli. I think I like the associated constant idea better though, although we should have various log levels (including panic), rather than just a bool. Defaulting to error feels correct though.

alice-i-cecile commented 1 month ago

@MiniaczQ points out that run conditions are systems too. If a run condition cannot be evaluated, it should be treated as if it returned false.

alice-i-cecile commented 1 month ago

@hymm points out that we should perform the validation checks as soon as possible, before spawning a task for the scheduler.

For piped systems, al params must be checked at the start, and the whole piped collection of systems must be skipped if any are invalid. This is because piped systems are opaque to the schedule.

MiniaczQ commented 1 month ago

For warnings, should we add a system config flag that decides whether to emit a warning? Some systems will use this as run conditions, but in others this can be an accidental decision.

By default it would be true. But then disabling it is a massive pain.

alice-i-cecile commented 1 month ago

Yeah, I was thinking about that in terms of system config initially, but I wasn't happy with the disabling ergonomics. I don't mind it existing, but in most cases I think that adding this at the resource / component level will better match the user intent and result in fewer bugs and boilerplate.

MiniaczQ commented 1 month ago

Can we perhaps leverage module-filtering for this? People could then selectively filter modules where they resolved this and no longer receive warnings. This would play along with plugins too, since they're essentially modules.

Actually no, that doesn't make sense. There is a solution to be made with disabling this per plugin, but it's a bit more work.

iiYese commented 1 month ago

although we should have various log levels (including panic), rather than just a bool. Defaulting to error feels correct though.

I'm not sure about this cause for anything other than panic you'll get a wall of errors like WGPU which isn't much more helpful than a panic. Errors by default means you'll get a lot of opting out (which is indicative of a bad default). I think some people might be worried but would be surprised to find how sane this behavior actually is.

alice-i-cecile commented 1 month ago

Hmm. Panic might be a better default, yeah. I worry about adding more causes of "what the heck why isn't my system working": forgetting to add the system and forgetting to add the plugin are both super frustrating currently.

iiYese commented 1 month ago

Actually error would be fine if we add a bool to Schedule or something to track if first time diagnostics are emitted. It's a bool that would be changed at most once so it's easy for branch predictors to optimize out.

urben1680 commented 1 month ago

We could also add an associated const on the resource trait

pub trait Resource: Send + Sync + 'static {
    const NOISY: bool = false;
}

If this is considered, I'd prefer a const generic bool SKIP_SYSTEM_IF_MISSING on Res/ResMut. Though this will probably conflict with Option<Res> if the default is true as that makes no sense in that case.

alice-i-cecile commented 1 month ago

@iiYese we already have a warn_once family of macros for exactly this purpose.

alice-i-cecile commented 1 month ago

Adding an associated type or constant on Resource will break trait objects for it, so that needs to be considered carefully.

iiYese commented 1 month ago

Adding an associated type or constant on Resource will break trait objects for it, so that needs to be considered carefully.

Why would people have trait objects of an empty trait tho? :thinking:

alice-i-cecile commented 1 month ago

Inserting arbitrary components / resources is something I've seen people want to do. That said, we already do this with the Storage associated type, and those users probably actually want to use reflection. So yeah, I think adding associated constants or types is fine.

SpecificProtagonist commented 1 month ago

@iiYese we already have a warn_once family of macros for exactly this purpose.

That's once globally (per callsite), but we want to do it per system, and we want to have all warnings for the first run.

alice-i-cecile commented 1 month ago

Right, we'll need to add some variant of that that checks against a key (probably the system name in this case).

jnhyatt commented 1 month ago

Ah goodness duplicate comment, my browser had a stroke