dhultgren / rimworld-keep-bed-ownership

MIT License
3 stars 1 forks source link

Job failure loop & suggested fix #3

Open CodeOptimist opened 1 year ago

CodeOptimist commented 1 year ago

Hi friend! The following save file, after a second or three of unpausing, is in a job failure loop ("10 jobs in one tick") of:

 • RimWorld.Pawn_Ownership -> Boolean ClaimBedIfNonMedical(RimWorld.Building_Bed)
 • RimWorld.RestUtility -> RimWorld.Building_Bed FindBedFor(Verse.Pawn, Verse.Pawn, Boolean, Boolean, System.Nullable`1[RimWorld.GuestStatus])

With various unpatched vanilla methods occurring before/after/between these.

MilkyToastees.zip

I don't know why it's looping. These are the only patched methods from (any) mods involved in the loop. (I logged every mod involved in the loop with my diagnostic tools and removed them one-by-one while it reproduced.)

If it's helpful, the other mods that had patched methods execute during the loop are the following: Common Sense, Doors Expanded, Dubs Bad Hygiene, Hospitality, JecsTools, Pick Up And Haul, Prisoners Don't Have Keys, Replace Stuff, Royalty Tweaks, RunAndGun, Share The Load, Simple sidearms, Smarter Construction, What the hack?!, Work Tab.

These are not related though because the issue occurs without them. But it might be helpful to leave them out. The console will be spammy, the weather will be corrupted, but the job loop error should still reproduce. Sorry I can't be more helpful.

CodeOptimist commented 1 year ago

Hi friend. Had another user with the "10 jobs in one tick" error from this. I actually have answers now though.

https://github.com/dhultgren/rimworld-keep-bed-ownership/blob/748e653a8e63da0d8090f590ccc449421a15fd4c/Source/1.4/Patch/PatchKeepBedOwnership.cs#L161C23-L161C23

The reality is, as innocuous as it appears, we can't get away with tweaking things during TryGiveJob without extra checks. 😞 I had to fix the same bug myself in While You're Up, and it had been there a long time: https://github.com/CodeOptimist/rimworld-while-youre-up/commit/8a589430673eb66fec2ab42563a316b814314b03

I was shocked to discover that even though plenty of vanilla code just returns null to cancel a job, when I was doing it, it would cause loops. I wrote about it here: https://discord.com/channels/241677926855081984/708435078614614116/1116185920337612871

Basically any time we mess with something in TryGiveJob we might cause a failure, and that can repeat infinitely, even just from vanilla code (as in this case and mine). Honestly seems like the best prevention is to check if it's the same pawn on the same tick, see my commit diff above ^, it's a pretty easy fix. (Just save the pawn & tick count and check if they're unchanged.)

Hope this helps. I was going to submit a PR but I'm afraid I'm going to forget if I don't write about it right now.