Issue with handling unload crashes.

Neopallium commented 5 years ago

Previous logic:

Plugin is at version 2.
New version 3 is detected.
Unload of version 2 crashes. (State is saved from version 2)
Version 3 is loaded with state from version 2.
Plugin is still in failed state (ctx.failure == CR_SEGFAULT)
Next update does a rollback (version 3 -> 2).
Plugin is back to version 2 (with bad unload code).

New logic:

Plugin is at version 2.
New version 3 is detected.
Unload of version 2 crashes. (State is NOT saved from version 2)
New version 3 is NOT loaded. Return quickly back to host.
Plugin version is still 2 in failed state. (ctx.failure == CR_SEGFAULT)
Next update does a rollback (version 2 -> 1).
Plugin is back to version 1. With state restored from backup.

It would be nicer to allow version 3 to load with state restored from backup, instead of rolling back to version 1. But I wouldn't to allow the host to be notified of the crash.

Should we save some flag to do a load of the new version instead of rolling back after a crash from unload?

Neopallium commented 5 years ago

Also included commits to keep the plugin version number and file number the same (easier to read the trace logs).

Neopallium commented 5 years ago

I also have the "don't re-use versions" changes ready. I didn't want to put too much in one PR. no_reuse_versions branch

fungos commented 5 years ago

This looks a lot how cr worked originally but improved. I had it changed the crash on unload because it seemed to be easier, but if is is corrupting the saved state then it is a real problem. Do this helps with the cr-rs issue?

Neopallium commented 5 years ago

No. I just noticed it while fixing the tests for the don't re-use versions change. I haven't seen it corrupt the saved state, I just thought it shouldn't save the state if it crashes, just like it doesn't save the state during a rollback. The new logic treats CR_UNLOAD crashes the same as CR_STEP crashes, does a rollback to the version before the crashing version without saving state. I would prefer to have the new version load from the backup state, but the host should have a chance to deal with the CR_UNLOAD crash.

I am thinking about adding some optional callback points that the host can intercept with something like #define CR_HOST_EVENT_CALLBACK(ctx, action). This would give the host more control over load/reload/rollback/unload/crash handling, while still keeping the current API for hosts that don't need it.

The main issue with cr-rs seems to be caused by TLS destructors partially keeping the old version mapped, then dlopen seems to partially load the new version (since it gets a new address for cr_main that isn't inside the old version's mapping space).

Right now the only option I see is to allow the old versions to stay mapped and not re-use version numbers.

P.S. I found out how to do the static bool CR_STATE bInitialized = false; trick in Rust:

#[link_section = ".state"]
static mut COUNTER: i32 = 1;

#[no_mangle]
pub fn cr_main(_ctx: *mut c_void, _cr_op: c_int) -> c_int {
  unsafe {
     COUNTER += 1; // update state counter.
     COUNTER // return value
  }
}

I should be able to create a macro to generate safe setter/getters for the state variable, so the unsafe block isn't needed to access the variable.

fungos commented 5 years ago

I think this change is good, I'll merge as I do not believe this will break compatibility as it was like this before my last big update. Thank you for contributing!

fungos / cr

Issue with handling unload crashes. #37