Zeenobit / moonshine_save

A save/load framework for Bevy game engine.
MIT License
81 stars 9 forks source link

"Entity [...] does not exist" error on load #10

Closed ivakam closed 1 month ago

ivakam commented 10 months ago

We are using moonshine_save for saving scenes from our level editor in our project but we keep running into an issue where sometimes a scene will suddenly become "corrupted" for no apparent reason with moonshine panicking due to an entity not existing. We do have certain components that use MapEntities, but printing the mapped entity IDs does not match the non-existent one.

Furthermore, grepping through the entire ron file yields no matches for the "missing" entity. Something is causing moonshine to look for an entity whose ID we can find no trace of anywhere. Replicating the exact same scene from scratch will often resolve the issue, but this is obviously not something that's sustainable long-term. The issue has been replicated across multiple machines (all running various forms of Linux, however).

Zeenobit commented 10 months ago

When is this error popping up specifically? Is it happening inside moonshine_save::load::insert_into_loaded, or somewhere else?

Is your issue similar to this?

It'd also be helpful if you can isolate exactly which kind of entities are giving you this issue. Is it a different kind of entity every time, or is it a specific entity with a specific set of components that's causing this? You can check this by:

  1. Checking which entities at runtime are missing from the potentially corrupt save file immediately after save
  2. If all desired entities show up in corrupt save file, then check which specific entity is failing to load and what components it has in the save file.
ivakam commented 10 months ago

Reading through that issue it does indeed seem familiar. I can't say I've managed to pin it down to a single component or scene (it seems entirely random to me, the exact same scene sometimes gets corrupted if I just load and re-save it). There does however seem to be some correlation with more complex scenes as well as editing existing scenes. I can't back this up with anything except "it feels like it" though.

I can however confirm that no entities seem to be missing from the file, and that the missing entity ID does not exist (or is referenced) anywhere in any file that I can find anywhere.

The panic indeed occurs in insert_into_loaded (line 322 on main branch) when it attempts to get the entity from the World while iterating through the entity map (which seems concerning to me).

Zeenobit commented 10 months ago

The panic indeed occurs in insert_into_loaded (line 322 on main branch) when it attempts to get the entity from the World while iterating through the entity map (which seems concerning to me).

Yup. This is what puzzled me about issue #6 as well. Since I switched over to using the EntityMapper instead of the old custom implementation Bevy has full control of which entities get loaded:

pub fn load(
    In(result): In<Result<Saved, LoadError>>,
    world: &mut World,
) -> Result<Loaded, LoadError> {
    let Saved { scene } = result?;
    let mut entity_map = EntityMap::default();
    scene.write_to_world(world, &mut entity_map)?;
    Ok(Loaded { entity_map })
}

pub fn insert_into_loaded(
    bundle: impl Bundle + Clone,
) -> impl Fn(In<Result<Loaded, LoadError>>, &mut World) -> Result<Loaded, LoadError> {
    move |In(result), world| {
        if let Ok(loaded) = &result {
            for entity in loaded.entity_map.values() {
                world.entity_mut(entity).insert(bundle.clone());
            }
        }
        result
    }
}

So as far as I can tell, this issue is only theoretically possible if either: a. Bevy doesn't spawn the entity and still populates the EntityMap or b. Bevy is populating the EntityMap with an incorrect entity or c. You're using a bad custom save pipeline that is potentially mutating the EntityMap between load and insert_into_loaded

Looking at Bevy code, A and B seem very unlikely. And I'm assuming you're not using a wonky save pipeline, which makes C false. So it's very strange.

If no entities are missing from the save file, then maybe try debugging this by dumping the contents of EntityMap into a log or something every time inside load.

Then, if the issue happens, compare the EntityMap dump with the corrupt save file. That might give us a better clue.

ivakam commented 10 months ago

I wish I had a solid reproduction example to give you, but since I'm actively working on my project i usually don't keep one at hand, and even if I did the scenes are in the order of 200k lines long, so gleaning anything useful from just looking at them (as compared to some dumped output) would probably not be very helpful. :sweat:

And you are correct that we're not doing anything special with the saving pipeline. We're just using the provided default for saving-on-event.

Zeenobit commented 10 months ago

Without a repro case or any clue as to which entity/component, it's hard to troubleshoot this.

Some other ideas that come to my mind:

  1. Can you do a diff between a corrupt and a similar non-corrupt save file? Would anything stand out?
    • You mentioned earlier that re-saving the world solves the issue. You could do a before/after diff of the corrupt save file to see what "gets fixed".
  2. Are any systems potentially running parallel to your save pipeline that might despawning the entity somehow?
    • load() is an exclusive system, so I don't see how that's possible. But I'd double check your systems and the save pipeline to be sure.
  3. Which schedule set are you registering your save pipeline in and how is it triggered?
    • Could it ever trigger more than once per update?
    • Does it overlap with any other systems?
  4. Are there any systems running in SaveSet::PostSave?

If the issue is happening as randomly as you describe it, it could be due to a race condition between two systems.

ivakam commented 10 months ago

If I encounter the issue again, I'll try to answer the questions you post here and report back!

Zeenobit commented 1 month ago

I finally managed to reproduce this issue locally.

The root cause of this panic is any referenced entity which isn't saved (i.e. marked with Save) itself.

See 88d56cb56de9a7e15ef7ac07be65b6f3c26e3291 for a fix + a test case which reproduces the issue.