OldUnreal / UnrealTournamentPatches

Other
976 stars 29 forks source link

[469b] Freeze/crash with some of the KOTH maps from the ChaosUT Beta4mappack #422

Closed TankBeef closed 2 years ago

TankBeef commented 3 years ago

I am experiencing recurring freeze/crashes with at least two maps from the ChaosUT Beta4mappack: koth_barfly and koth_temple. There could be more from that pack. Reproduce with a match with bots, at least ten minutes, last freeze was after nine. Log attached. As always, Win10, 469b, OpenGL. Beta4mappack crash.log

SeriousBarbie commented 3 years ago

I noticed this in Beta4mappack crash.log: line 134: _Log: OpenGL Error: GL_INVALIDENUM (please report this bug) several occurrences: Warning: ALAudio: PlaySound alSourcef setting error: Invalid Value

SeriousBuggie commented 3 years ago

Warning: ALAudio: PlaySound alSourcef setting error: Invalid Value

I meet this line too often without any crash. For example: #282

Log file cropped, so can not be say what is reason is exactly. Possible need open log window and record video, for get last lines of log. If there be something at all.

TankBeef commented 3 years ago

I noticed this in Beta4mappack crash.log: line 134: _Log: OpenGL Error: GL_INVALIDENUM (please report this bug) several occurrences: Warning: ALAudio: PlaySound alSourcef setting error: Invalid Value

Yes, I get floods of both messages a lot, but no crash most of the time. In fact, I reported the GL_INVALID_ENUM before, here https://github.com/OldUnreal/UnrealTournamentPatches/issues/333 Also getting floods of this one too: OpenGL Error: GL_INVALID_OPERATION (please report this bug)

EDIT: Ok, the crash also occurs with D3D9. I managed to get this shot with the log window included. D3D Crash KOTH It mentions the Demonica bot, but I usually have no issues with it in other maps or game modes. Did not attach the log, it does not mention the koth_temple map anywhere. Also, the log window shows this flood of the GL_INVALID_OPERATION mentioned earlier with OpenGL. I don't get this in 436. Reproduce typing "showlog" in console. UT99 OpenGL Error

stijn-volckaert commented 3 years ago

Could you upload the maps and their dependencies or post a download link please?

TankBeef commented 3 years ago

Sure. This is the pack: https://unrealarchive.org/mappacks/unreal-tournament/chaosut/C/chaosut-beta4mappack_b61bf56c.html Please try koth_temple first. The crash is very common with this one.

stijn-volckaert commented 3 years ago

Confirmed. The bots get stuck in an infinite loop in state code. I'll let you know when the bug is fixed.

TankBeef commented 3 years ago

Great. Thanks a lot.

stijn-volckaert commented 3 years ago

We're still working on a fix, but FWIW, I can share some more details about the bug. When you execute GotoState() or GotoLabel() in UnrealScript, the VM will calculate the state and/or location you want to jump to and save that location in a native data structure called the "state frame". The VM delays the actual jump to the new state/label until it has finished executing the current (nested) expression.

Unfortunately, the expression can modify the calculated location before we jump to the new state. This is exactly what happened in this case. I occasionally saw bots go through the following sequence of state transitions (within one Level Tick):

Attacking => Fallback => TacticalMove => RangedAttack => Charging => FallingState

The last transition happens when we execute the following piece of code in Botpack.Bot.Charging:

5044 │         if ( !FindBestPathToward(Enemy, true) )
5045 │         {
5046 │             Sleep(0.0);
5047 │             GotoState('TacticalMove', 'NoCharge');
5048 │         }

Within FindBestPathToward, the engine invokes the SpecialHandling function for the target path node, which is often a Botpack.JumpSpot on KOTH-Temple. Botpack.JumpSpot.PendingJump.SpecialHandling calls BigJump() on the Bot. This causes a state transition to state FallingState.

Before we get to the new state, the VM finishes executing the expression at line 5044 in Botpack.Bot.Charging. If FindBestPathToward returns true, the expression causes a jump to the expression at line 5049. Otherwise, we jump to line 5406.

In both cases, we overwrite the target location we calculated for the new state. When we finally jump to the new state, we do not end up and its Begin label, but instead jump into the middle of an UnrealScript bytecode instruction, which ultimately causes the VM to execute invalid code. This can cause crashes or freezes.

As far as I can tell, this bug affects Unreal Engine 1 and 2 (and possibly even 3). We confirmed that the bug is present in Unreal 224-227j and Unreal Tournament 400-451b as well.

SeriousBuggie commented 3 years ago

So you mean sequence of code

GotoState('NewState', 'NewLabel');
GotoState('NewState2');

can produce to jump into NewState2 but with offset of code as for NewLabel of NewState?

stijn-volckaert commented 3 years ago

So you mean sequence of code

GotoState('NewState', 'NewLabel');
GotoState('NewState2');

can produce to jump into NewState2 but with offset of code as for NewLabel of NewState?

This particular sequence is fine because: 1) The two expressions are independent top-level expressions 2) The default value for the label argument in GotoState is 'Begin', so the second expression is equivalent to GotoState('NewState2', 'Begin')

Here's a test case Dots wrote last night. This one is completely broken:

auto state TestA 
{
    function bool DoChange()  
    {
        GoToState('TestB');
        return false;
    }
Begin:
    if( DoChange() )
    {
        Log("Hello");
    }
}

state TestB
{
Begin:
    Log("1");
}

The (partial) disassembly looks as follows:

auto state TestA
{
    function bool DoChange()
    {
                //
        // 0x0000: EX_Native (iNative=113,Function=Core.Object.GotoState,OperatorType=0)
        // 0x0001:   EX_NameConst (NameConst=TestB)
        // 0x0007: EX_Return
        // 0x0008:   EX_False
        // 0x0009: EX_Return
        // 0x000A:   EX_Nothing
        //
    }

    //
    // 0x0000: Begin
    //
    // 0x0000: EX_JumpIfNot (JumpOffset=0x0012)
    // 0x0003:   EX_VirtualFunction (FunctionName=DoChange,Candidate=brokenstate.BSActor.TestA.DoChange)
    // 0x0009: EX_Native (iNative=231,Function=Core.Object.Log,OperatorType=0)
    // 0x000A:   EX_StringConst (StringConst="Hello")
    // 0x0012: EX_Stop
    // 0x0013: EX_LabelTable
    //
}

state TestB
{
    //
    // 0x0000: Begin
    //
    // 0x0000: EX_Native (iNative=231,Function=Core.Object.Log,OperatorType=0)
    // 0x0001:   EX_StringConst (StringConst="1")
    // 0x0005: EX_Stop
    // 0x0006: EX_Nothing
    // 0x0007: EX_LabelTable
    //
}

What happens here is: 1) We execute GotoState('TestB'). The engine calculates the offset of label Begin within state TestB and stores that offset as the pending jump target. In this case, the calculated offset is 0x00. The engine also remembers that our pending target state is TestB. 2) We finish executing the nested expression that caused the state change. In this case, the outer bytecode of our expression is the EX_JumpIfNot at offset 0x00 within TestA.Begin. To finish the execution, we need to finish the EX_VirtualFunction at offset 0x03 (because this is the inner expression). That means we also need to finish executing the DoChange function (because the function needs to return before we can evaluate the result of EX_VirtualFunction). In other words, we now execute DoChange's EX_Return. This makes EX_VirtualFunction produce a value of false. EX_JumpIfNot now sees the false value and overwrites our previously calculated jump offset of 0x00 with an invalid jump offset of 0x12. 3) Now that we've finished executing the current nested expression, AActor::ProcessState transfers control to the new state (which is still TestB ) and jumps to the calculated jump offset within that state. In this example, jump offset 0x12 is beyond the end of the state code, so the game will most likely crash. In the Botpack.Bot.Charging example, the jump offset is still within the state code for Botpack.Bot.FallingState, but it points to the middle of a bytecode instruction, so we begin executing invalid instructions until the game crashes or freezes.

SeriousBuggie commented 3 years ago

Oh. Now I see. Thanks for explanation. It is bad design. Jump for pending state must not use share storage for label offset with something else.

What about use separate variable for jump in state codes? Independent from other changes.

In this case we set it and use when control transfer back to engine (by state code, or by end execution of non-state code). Any if can not break it.

In general, problems come because if not really if here. it is more like as this weird hack:

if( !DoChange() )
    GotoState(PendingState, 'skip');
Log("Hello");
skip:
stijn-volckaert commented 3 years ago

I've implemented a fix that seems to work well. I separated the pending jump target from the program counter so the jump target cannot get overwritten by expressions that need to be evaluated before we jump into the new state. Unreal 227 will also get a fix.

an-eternity commented 3 years ago

It seems, there is something else... Strange crash by some undefined reason still happens (even with v469c from July 20) on some maps (not related to Bots though). For example, MH-MHF-ADV_Assault[RMD].unr, maybe it is maps' problems... Logs do not contain any useful information except this:

Exit: Executing UObject::StaticShutdownAfterError
Exit: Exiting.
Log: Log file closed, Sat Aug 14 23:53:09 2021

Exit: Executing UObject::StaticShutdownAfterError
Exit: Exiting.
Log: Log file closed, Sun Aug 15 00:23:32 2021

Logs usually have some ScriptWarnings a few minutes prior to crash, it could be somehow related:

ScriptWarning: Grenade MH-MHF-ADV_Assault[RMD].Grenade17 (Function UnrealShare.Grenade.PostBeginPlay:017A) Accessed None 'Instigator'
ScriptWarning: Grenade MH-MHF-ADV_Assault[RMD].Grenade17 (Function UnrealShare.Grenade.PostBeginPlay:0182) Accessed None 'Zone'
ScriptWarning: Razor2 MH-MHF-ADV_Assault[RMD].Razor20 (Function Botpack.Razor2.Flying.ProcessTouch:0081) Accessed None 'Instigator'
TankBeef commented 3 years ago

In that case, i suggest starting a new issue. The two crashes I reported so far only occur when playing with bots. This seems to be another type of crash, I think.

SeriousBuggie commented 3 years ago

For example, MH-MHF-ADV_Assault[RMD].unr, maybe it is maps' problems...

This author usually make fast-built dirty maps with a big count of errors. I do not think this is related to current issue.

an-eternity commented 3 years ago

Logs from the build for Windows are more informative, point out to some problem with Paths:

(With Bots)

Critical: ULevel::Trace
Critical: FSortedPathList::expandAnchor
Critical: APawn::findPathToward
Critical: (MH-MHF-ADV_Assault[RMD].tnalibot0 Botpack.Bot.FindBestPathToward)
Critical: AActor::ProcessState
Critical: Object tnalibot MH-MHF-ADV_Assault[RMD].tnalibot0, Old State State Botpack.Bot.Charging, New State State Botpack.Bot.Charging
Critical: AActor::Tick
Critical: TickAllActors
Critical: ULevel::Tick
Critical: (NetMode=1)
Critical: TickLevel
Critical: UGameEngine::Tick
Critical: UpdateWorld
Critical: UServerCommandlet::Main
Exit: Executing UObject::StaticShutdownAfterError
Exit: Exiting.
Log: Log file closed, 08/15/21 01:00:36

(Without Bots)

Critical: APawn::TraverseFrom
Critical: APawn::findRandomDest
Critical: (MH-MHF-ADV_Assault[RMD].KrallElite0 UnrealShare.ScriptedPawn.Roaming.PickDestination)
Critical: AActor::ProcessState
Critical: Object KrallElite MH-MHF-ADV_Assault[RMD].KrallElite0, Old State State UnrealShare.ScriptedPawn.Roaming, New State State UnrealShare.ScriptedPawn.Roaming
Critical: AActor::Tick
Critical: TickAllActors
Critical: ULevel::Tick
Critical: (NetMode=1)
Critical: TickLevel
Critical: UGameEngine::Tick
Critical: UpdateWorld
Critical: UServerCommandlet::Main
Exit: Executing UObject::StaticShutdownAfterError
Exit: Exiting.
Log: Log file closed, 08/15/21 16:07:40

There is fixed version of this map that works fine, without this crash: MH-ADV_Assault[Remidas]+SBFix. Anyway, it is possible some issue still remains in the native part that gets triggered by the bugs in that map, until proven otherwise.

SeriousBarbie commented 3 years ago

some problem with Paths

In that fix (MH-ADV_Assault[Remidas]+SBFix) I removed all PathNodes and created a new rudimentary pathing (see docs). If you encounter any bugs, please let me know.

stijn-volckaert commented 2 years ago

Fixed in 469c-RC2