bevyengine / bevy

A refreshingly simple data-driven game engine built in Rust
https://bevyengine.org
Apache License 2.0
35.14k stars 3.45k forks source link

`unwrap` considered harmful #12660

Open james7132 opened 5 months ago

james7132 commented 5 months ago

What problem does this solve or what need does it fill?

Unwrap should generally be avoided. We already encourage contributors to avoid the use of Option::unwrap, yet crashes from panics are still one of the most common categories of issues we're seeing now.

What solution would you like?

What alternative(s) have you considered?

Leaving it as is. Keep crashing from unwraps.

alice-i-cecile commented 5 months ago

See https://github.com/bevyengine/bevy/issues/10166 for a major culprit.

SludgePhD commented 5 months ago

Another API design issue related to this is Query::single / Query::get_single – here, the shorter method name that users are most likely to go for will .unwrap() implicitly. IMO Query::single is almost never the right choice and just making it have the semantics of get_single would be better.

alice-i-cecile commented 5 months ago

single/get_single used to work like that: users at the time found it tedious to constantly check if their player or camera had despawned!

It was also moved to be consistent with .resource / .get_resource, which while it still panics is much less prone to unexpected failure, as removing resources is much more rare than despawning entities.

That said, I am personally allergic to using .single in my own serious projects due to frustrations with panics in edge cases and during refactors, and return early on errors.

bushrat011899 commented 5 months ago

Perhaps instead of get_x and x, we could use x and x_unchecked instead? This makes the panicking API clearly distinguishable from the non-panicing, while also encouraging the idiomatic path.

Regardless, I agree with this change. unwrap is a code smell within a library.

james7132 commented 5 months ago

x_unchecked makes my mind jump immediately to unsafe, which we shouldn't be encouraging unless it's paramount that users squeeze every last ounce of performance out.

alice-i-cecile commented 5 months ago

I think that x_unchecked (by whatever name) isn't worth adding. It's more API to maintain, and not actually much nicer than just another unwrap call.

benfrankel commented 5 months ago

Another source of panics is when systems with Res or ResMut arguments are scheduled when the resource doesn't exist. In complex applications this might be triggered only in very specific scenarios, making it difficult to catch by playtesting.

The alternatives off the top of my head are a little awkward, though:

  1. If a system requires a resource that does not exist, skip that system and log a warning.
  2. Require systems to use Option<Res> instead of Res, or give Res the same behavior as Option<Res> somehow.
msvbg commented 5 months ago

I would be in favor of renaming single to something like get_single_unchecked, or some other similarly clunky name, to discourage its use. Or it could be removed entirely. I've seen a fair few crashes coming from single.

alice-i-cecile commented 5 months ago

1255 is a big source of dumb panics as I refactor as well.

james7132 commented 5 months ago

Removing the controversial label, as it seems like everyone's in agreement that this is something we should be doing.

yyogo commented 5 months ago

How would you like PRs submitted for this? I have a branch where I started removing unwrap()s in most of the smaller crates but it turned out to be quite huge. Maybe this issue can track which crates have been "cleaned"?

alice-i-cecile commented 5 months ago

@yyogo one crate at a time please: although for bevy_ecs it's likely best to do it one module at a time.

Brezak commented 5 months ago

If we're doing this one crate at a time a tracking issue to keep track which crates have been cleaned up would be useful.

vertesians commented 5 months ago

I would be in favor of renaming single to something like get_single_unchecked, or some other similarly clunky name, to discourage its use. Or it could be removed entirely. I've seen a fair few crashes coming from single.

In that situation, wouldn't it make more sense to deprecate single() entirely and tell people to use get_single().unwrap() instead?

musjj commented 5 months ago

Maybe relevant:

I also find myself reaching for .single() way too often, but only because the standard Rust error-handling patterns are not really usable with Bevy. I feel that if we can just do something like .single()?, a lot of these issues would just disappear.

mockersf commented 4 months ago

when replacing an unwrap with an expect, the message should:

if the message starts being too long, an error code can be used to link to a page with more details, see https://github.com/bevyengine/bevy/tree/main/errors

cart commented 4 months ago

I'm also on board for making Bevy less panicky. But I don't think we should seriously consider removing panicking APIs or renaming get_x to x until we have made error handling significantly nicer and more opinionated.

First, without Results + ? syntax, working with options/results is comparatively nasty ergonomically:

let Ok(window) = windows.get_single() else {
    return;
};
let Some(cursor) = window.cursor_position() else {
    return;
};
let Ok(board) = boards.get_single() else {
    return;
};
let Ok((camera, camera_transform)) = cameras.get_single() else {
    return;
};
let Some(ray) = camera.viewport_to_world_2d(camera_transform, cursor) else {
    return;
};
// Panicking variant
let window = windows.single();
let cursor = window.cursor_position().unwrap();
let board = boards.single();
let (camera, camera_transform)) = cameras.single();
let ray = camera.viewport_to_world_2d(camera_transform, cursor).unwrap();

Note that I've just copy/pasted some code from one of my games. Some of the unwraps don't make sense in this context.

Porting this to a Result system would fail, naively:

fn system(.....) -> Result<()> {
  let window = windows.get_single()?;
  // this returns Option and therefore fails
  let cursor = window.cursor_position()?;
  let board = boards.get_single()?;
  let (camera, camera_transform) = cameras.get_single()?;
  // this returns Option and therefore fails
  let ray = camera.viewport_to_world_2d(camera_transform, cursor)?;
  Ok(())
}

This is a Rust restriction (unfortunately). It is impossible to build a Result / error type that handles both Results and Options.

Bevy APIs (including core ECS apis) sometimes return Result and sometimes return Option (ex: Queries return Results, World apis and Assets return Options). In these cases you either need to ok_or (to convert all Options to Results) or result.ok() (to convert all Results to Options):

fn system(.....) -> Option<()> {
  let window = windows.get_single().ok()?;
  let cursor = window.cursor_position()?;
  let board = boards.get_single().ok()?;
  let (camera, camera_transform) = cameras.get_single().ok()?;
  let ray = camera.viewport_to_world_2d(camera_transform, cursor)?;
}
fn system(.....) -> Result<()> {
  let window = windows.get_single()?;
  let cursor = window.cursor_position().ok_or(MyErr)?;
  let board = boards.get_single()?;
  let (camera, camera_transform) = cameras.get_single().?;
  let ray = camera.viewport_to_world_2d(camera_transform, cursor).ok_or(MyErr)?;
  Ok(())
}

The system above actually isn't too bad. But imagine doing this when interleaving Queries / Assets / EntityMut, etc in the same system. If we're going to embrace error handling, I think we need to convert most internal APIs to use Results instead of Option to avoid the compatibility issues. Using them together in their current form is an exercise in frustration and ergonomically unsatisfying.

cart commented 4 months ago

Note that Result systems are already supported. However you need to manually convert them to "normal systems" using map:

app.add_systems(Update, a_system.map(error));

fn a_system(query: Query<&A>) -> Result<()> {
  let a = query.get_single()?;
  Ok(())
}

We'd want to make this automatic if we're going to encourage this pattern:

app.add_systems(Update, a_system);
cart commented 4 months ago

We'd also want our own Result error type similar to anyhow (in that it can support arbitrary error types via boxing).

cart commented 4 months ago

I'm also partial to this in my own code as I don't love typing/reading Ok(()) at the end of every system:

const OK: Result<()> = Ok(());

fn system() -> Result<()> {
  OK
}
bushrat011899 commented 4 months ago

I'm also partial to this in my own code as I don't love typing/reading Ok(()) at the end of every system:

const OK: Result<()> = Ok(());

fn system() -> Result<()> {
  OK
}

Perhaps a #[system] macro similar to Leptos' #[component] would be worth investing in here? We already have a gap in error messaging due to how all_tuples! styled trait implementations work, so there's already an incentive here.

bushrat011899 commented 4 months ago

Also worth considering is the same game might want differing behaviour for different Result systems. If the system that plays a sound effect fails, maybe I only want the error logged, if one of my networking systems fails, perhaps I want to take the user to the main menu, etc.

NiseVoid commented 4 months ago

But I don't think we should seriously consider removing panicking APIs or renaming get_x to x until we have made error handling significantly nicer and more opinionated.

I think removing them would definitely require easy "error handling" for systems, but I think renaming them could have a relatively minor impact as long as it's not get_x -> x and x -> some significantly longer name like x_unchecked that also suggests it's unsafe. The warnings you'd get on a migration like that would also suggest the panicking version was removed. I remember in Go many libraries had equivalent of a get_x and must_x, with the latter panicking if anything went wrong.

I think overall it's quite problematic to have panicking versions be the obvious function because they are at x, and nothing in their name reminds you that they will in fact panic. When I started using bevy the panicking versions seemed nice, but the longer I've worked on my game the more annoying these version are. I usually end up replacing all the panicking versions during a refactor of my game's states because the crash, recompile, repeat loop slows down development to a crawl. Making the new user experience simpler is great, but not if it's in ways that guarantee the user will need to do minor refactors all over their codebase later.

musjj commented 4 months ago

Note that Result systems are already supported. However you need to manually convert them to "normal systems" using map:

Yes, but this will completely obscure where the error occurred, making it borderline unusuable IMO. bevy_mod_sysfail is slightly better because it includes the name of the system in the log message. But all the backtrace (line number, etc.) is still gone, which is why I still prefer using .unwrap.

Apparently anyhow can capture backtraces, which seems suitable for user code. Maybe bevy can integrate with it?

tbillington commented 3 months ago

While it's more in "user code/systems", I wanted to contribute what I've been doing for ages now. Don't know how relevant to "engine" code it is, but since adopting this pattern I practically never reach for Result or Option much anymore. I have a util module with a selection of macros that will do the named operation or return if not possible.

The practical result of how these macros simplify/speed up writing gameplay logic is that I never use unwrap in my code.

The advantages are

Downsides

Example macros

macro_rules! get_mut {
    ($q:expr,$r:expr) => {
        match $q.get_mut($r) {
            Ok(m) => m,
            _ => return,
        }
    };
}

macro_rules! get_single {
    ($q:expr) => {
        match $q.get_single() {
            Ok(m) => m,
            _ => return,
        }
    };
}

// get!, get_single_mut! some!, ok! ...etc

For less critical/WIP code I'm usually fine with "nothing happening" if the expected values aren't available. I do not want to panic or unwrap for these, I would have just written a let else to return anyway.

When I am happy with the behaviour and want to commit to that code I replace the macros with a let else that error!s with debug information then returns if the value couldn't be gotten.

fn grid_debug_gizmo(
    mut gizmos: Gizmos,
    window: Query<&Window>,
    camera: Query<(&Camera, &GlobalTransform), With<crate::MainCamera>>,
) {
    let window = get_single!(window);
    let (camera, camera_t) = get_single!(camera);
    let cursor_ui_pos = some!(window.cursor_position());
    let cursor_world_pos = some!(camera.viewport_to_world_2d(camera_t, cursor_ui_pos));

    gizmos.rect_2d(
        cursor_world_pos.grid_aligned_centered(),
        0.0,
        Vec2::splat(GRID_SIZE + 4.0),
        Color::PINK,
    );
}

The nice feature of the macros returning is it plays well with iterating queries using for_each. If you were using for loops it would short circuit out of the entire system which is usually undesired.

fn handle_game_scene_switch(location: Query<&PawnLocation>, etc) {
    query.iter().for_each(|(e, p)| {
        let location = get!(location, p.get());
        setup_local_map_pawn_renderer(&mut cmd, e, &asset_server, location, etc);
    });
}
tychedelia commented 3 months ago

I find these despawn panics very annoying, but just as a counterpoint, they have helped me identify bad system ordering before, where I was violating my own invariants. Part of me feels like these should be warn rather than debug.

ramirezmike commented 2 months ago

I just ran into this which panics because the rotation is denormalized.

Transform::from_rotation(Quat::from_array([0.0, -0.0, 0.0, 0.0])).forward();

I'm still digging into how my transform's rotation ends up this way, but it was really surprising that Transform::forward can panic (because it calls local_z which can panic because of Quat multiplication #12981).

Should something as ubiquitous as Transform note in its docs that it can panic due to Quat's panics?

Also, separately, am I crazy? I'm running into Quat panics but I'm not explicitly enabling glam_assert.. it even happens in bevy playground.

https://learnbevy.com/playground?share=5763bdd5c6ccc345e4fa6d0e054e2cd484c211a7603c589fc01fb3a03910c7b5

alice-i-cecile commented 2 months ago

the thing I don't get is why to_scale_rotation_translation is spitting out a quat which is so denormalized

By @mweatherley on Discord. Strongly agree that we shouldn't be panicking there though, simply loudly erroring.

mweatherley commented 2 months ago

Yeah, I agree with Alice that these situations should loudly warn instead of panicking outright. In some of these cases (e.g. ramirezmike's example) it's important for the issue to be fixed upstream because the Transform is actually invalid (i.e. unrecoverable internal state) instead of just being somewhat denormalized.

cart commented 2 months ago

Just outlined what I think our plan should be here: https://github.com/bevyengine/bevy/issues/14275#issuecomment-2223708314