alliedmodders / sourcemod

SourceMod - Source Engine Scripting and Administration
http://www.sourcemod.net/
973 stars 422 forks source link

SDKTools GameRules native crash #1902

Closed Kenzzer closed 3 months ago

Kenzzer commented 1 year ago

Help us help you

Environment

Description

Through unknown conditions, the native GameRules_SetProp and more specifically, the extension sdktools's GetGameRulesProxyEnt will cause a segmentation fault.

Problematic Code (or Steps to Reproduce)

Call GameRules_SetProp repeatedly during any map. This might or might not trigger the crash.

Logs

https://crash.limetech.org/d42cffz6sta4 But this seems much more widespread than the crash signature I've, if we filter stats https://crash.limetech.org/stats/sdktools.ext.2.tf2.so/GetGameRulesProxyEnt Those servers seem to crash in the same location as I, and they're most certainly not running the same plugin setup as I.

FortyTwoFortyTwo commented 1 year ago

I also have this common crash, along with this more rarer crash log related to __cxa_pure_virtual https://crash.limetech.org/q5e2z2aeydgz

I believe the crash is more specifically from a one entity in FindEntityByNetClass, but couldn't figure out the exact cause of it.

Kenzzer commented 1 year ago

After further investigation I've found the following :

FindEntityByNetClass is inlined in GetGameRulesProxyEnt. Confirming the suspicions of the above comment. I've also decided to reverse the assembly against sourcemod source code

.text:0003D9FA                 mov     eax, ds:gamehelpers
.text:0003D9FF                 mov     ecx, [eax]
.text:0003DA01                 mov     [esp+4], edi
.text:0003DA05                 mov     [esp], eax
.text:0003DA08                 call    dword ptr [ecx+2Ch]
.text:0003DA0B                 test    eax, eax
.text:0003DA0D                 jz      short loc_3D9F0
.text:0003DA0F                 mov     ebx, eax
.text:0003DA11                 test    byte ptr [eax], 2
.text:0003DA14                 jnz     short loc_3D9F0
.text:0003DA16                 mov     esi, [ebx+8]
.text:0003DA19                 test    esi, esi
.text:0003DA1B                 jz      short loc_3D9F0
.text:0003DA1D                 mov     eax, [esi]
.text:0003DA1F                 mov     [esp], esi
.text:0003DA22                 call    dword ptr [eax] <==== crash
.text:0003DA24                 test    eax, eax
.text:0003DA26                 jz      short loc_3D9F0
.text:0003DA28                 mov     eax, [esi]
.text:0003DA2A                 mov     [esp], esi
.text:0003DA2D                 call    dword ptr [eax+4]
.text:0003DA30                 mov     eax, [eax]
.text:0003DA32                 mov     ecx, [ebp+var_10]
.text:0003DA35                 mov     [esp+4], ecx
.text:0003DA39                 mov     [esp], eax
.text:0003DA3C                 call    strcmp

What I reversed, assuming I didn't make any mistakes :

edict_t* current = gamehelpers->EdictOfIndex(i); // 0x0003DA0FA -- 0x0003DA08
if (current == nullptr) // 0x0003DA0B
    // loop // 0x0003DA0D
if (current->IsFree()) // 0x0003DA11
    // loop // 0x0003DA14
IServerNetworkable* network = current->GetNetworkable(); // 0x0003DA16
if (network == nullptr) // 0x0003DA19
    // loop // 0x0003DA1B
// The crash (vtable offset is 0)
network->(pure virtual function); // 0x0003DA22 - Supposedly GetEntityHandle

Which seems to coroborate the alternative crash dump @FortyTwoFortyTwo linked.

Interestingly enough, sourcemod code base hardly makes any use of that vcall. Only two results : https://github.com/alliedmodders/sourcemod/search?q=GetEntityHandle

Furthermore FindEntityByNetClass is available elsewhere in the codebase, more specifically tf2 extension. And the Entity Handle check is missing there, home come ? https://github.com/alliedmodders/sourcemod/blob/1fbe5e1daaee9ba44164078fe7f59d862786e612/extensions/tf2/extension.cpp#L432-L462 I decided to git blame the addition of GetEntityHandle and PR #1089 seems to be blamed for this.

Analysing their bug report, I disagree with the fix that was provided. According to the author the function

ServerClass* CServerNetworkProperty::GetServerClass()
{
    if ( !m_pServerClass )
        m_pServerClass = m_pOuter->GetServerClass();
    return m_pServerClass;
}

is to be blamed because m_pOuter "can be null", but they didn't provide any crash dump to support their claim. Furthermore, m_pOuter can never be nullptr according to what I'll provide next.

CServerNetworkProperty::m_pOuter is assigned when CServerNetworkProperty::Init is called. https://github.com/ValveSoftware/source-sdk-2013/blob/0d8dceea4310fde5706b3ce1c70609d72a38efdf/sp/src/game/server/ServerNetworkProperty.cpp#L60

void CServerNetworkProperty::Init( CBaseEntity *pEntity )
{
    m_pPev = NULL;
    m_pOuter = pEntity;

And the Init function is called under CBaseEntity ctor https://github.com/ValveSoftware/source-sdk-2013/blob/0d8dceea4310fde5706b3ce1c70609d72a38efdf/sp/src/game/server/baseentity.cpp#L364

CBaseEntity::CBaseEntity( bool bServerOnly )
{
    [...]
    NetworkProp()->Init( this );

Finally, Sourcemod call to IServerNetworkable* network = current->GetNetworkable(); can only ever return a non nullptr, when the edict_t::m_pNetworkable has been assigned, an assignment which only happens MUCH AFTER CBaseEntity constructor call. Under CBaseEntity::PostConstructor https://github.com/ValveSoftware/source-sdk-2013/blob/0d8dceea4310fde5706b3ce1c70609d72a38efdf/sp/src/game/server/baseentity.cpp#L503

Therefore it is impossible, we ever reach the call IServerNetworkable::GetServerClass that has an invalid m_pOuter. And valve themselves don't null check property, because it is most likely intended to never be null. In fact, if we check their network code, they never ever consider the possibility of m_pOuter being null. I therefore suggest a reversal of PR #1089

Extra note

This doesn't address however the main issue we uncovered here. IServerNetworkable::GetEntityHandle vtable entry is not initialised. Which is probably what the author of the other PR encountered but instead with IServerNetworkable::GetServerClass, and most likely wrongly deduced that the src code of that function was to blamed for the nullptr error but the real issue was most likely that the vtable wasn't initialised.

This all points to me, this isn't an error on Sourcemod's part. And there's a wrong IServerNetworkable assignment to an edict by the game or a plugin. And it most likely is an entity that overrides edict index attribution, like a player or something alike, which in turns prevent CBaseEntity::PostConstructor from assigning its valid servernetworkable to the entity. Or maybe it's an entity in the process of being deleted, and its memory has already been freed ?

KyleSanderson commented 1 year ago

I've certainly crashed from a null m_pOuter. With that being said, the situation always seemed to be collision related.