CorsixTH / CorsixTH

Open source clone of Theme Hospital
Other
4.03k stars 366 forks source link

Race condition can rarely lead to invalid seek_toilets action #1561

Open mugmuggy opened 5 years ago

mugmuggy commented 5 years ago

Describe the issue

Through a thought process mainly (state was manpiuated to get me to the conclusion) I believe I found a rare but looks possible race condition related to this https://github.com/CorsixTH/CorsixTH/issues/555#issuecomment-527120811. I've hit the resulting error a few times across two different saves, but sometimes I get it, many times I don't,ie I found the cause of #555 whilst trying to identify this process.

The state has to be similar. Except this will be for normal rooms. Also I believe its more likely to occur when running at full speed, as tickDay, is called before the ticks. Ie if now + ticks = tomorrow, tickDay. Then the loop afterwards is for each tick that needs to execute.

A patient is queued, at least 3rd patient queued ie more likely to have actions: use_object (bench), queue, walk

Patient:tickDay executes, and the use toilet logic fires (setNextAction), action queue is use_object (being interrupted), queue, seek_toilets. Patient going_to_toilet now is yes., queue action is set todo_interrupt, use_object will undergo an interrupt, ie patient animation to stand starts.

Within the next one or two game ticks, the queue for the queue action, will have queue:pop() called, which will initiate the callbacks for queue:onChangeQueuePosition. A bench has now become free that is closer to the room, and onChangeQueuePosition finds the queue task, and then inserts walk and use_object (for the closer bench) actions.

Patients action queue becomes, walk, use_object, queue (todo_interrupt), seek_toilets,

Either during this walk, or whilst on the bench, the patient tickDay is called again and this time Patient:pee() executes, resetting the going_to_toilet flag to no.

As the patient is seated is still at least the 2nd position in the queue, they are eligible in Room:trytoFindNearbyPatients. As the going_to_toilet flag is now "no", tryMovePatient can succeed in routing the patient., This attempts to find the walk after the queue action (which is now no longer there) and replace it with the new enter action to the new room. And also in doing so, sets todo_interrupt on the seek_toilets action.

As no new action appears after seek_toilets, it results in an empty action queue dialog and as the patient is then given a SeekReceptionAction and can continue on, as they'll be redirected to the room (unless the room is the toilet where they will go into the loop of #555).

mugmuggy commented 5 years ago

Being mindful I'm also playing with around with a wip testing stuff - I believe an opportunity also exists for an assert in humandoid_actions\queue.lua partially as a result of the change in #1453.

This one appears to be when the patient is also in the queue, and in a single tickDay, gets a vomit() and pee() call, and also then gets a queue:onGetSoda. This will result in an action queue of: walk, pee, walk, vomit, walk, idle, queue, walk (to room). The change in #1453 will change the must_happen to the original must_happen which if the walk hadn't started yet would not be set, walk after the pee and vomit, which then calling onGetSoda will interrupt_head to assert(action.must_happen).

So far the only instance I've back tracked on what was occuring, was for a patient that would have had patient.going_to_toilet = "yes" as they were queued for the toilet, which would have vetoed the drinks machine possibility if the code after pee didn't execute. This still leaves other opportunities for interrupt_head to run and fail in that same scenario.