Create Built In Module to address garage data corruption during saving

Brett208 commented 4 years ago

Write a built in module to patch Outpost 2 and prevent a union field from being set to null intermittently during save process.

See forum post for details.

DanRStevens commented 4 years ago

Link to forum thread containing the discussion: Evacuation Under Fire - Ply Campaign Scenario

DanRStevens commented 4 years ago

I believe I have the conditions that lead to the double save corruption problem. It happens when the unit array base address is a multiple of 120. (The size of a unit record is 120 bytes).

unitArrayBaseAddress % 120 == 0

Digging into the memory allocation routines a bit, I found internal calls to HeapAlloc. There are no memory alignment guarantees given in the documentation, however, some unofficial online sources suggest the returned memory block will be 8 bytes aligned. Given that 120 is a multiple of 8, the frequency with which unitArrayBaseAddress % 120 == 0 is true should be 120 / 8 == 1 / 15. This matches with experimental results.

For a bug reproducer level, we can offset the unitArrayBaseAddress field so it becomes a multiple of 120. As a cheap hack, we can do this by adding reinterpret_cast<std::uintptr_t>(-unitArrayBaseAddress) % 120:

// Untested
*reinterpret_cast<std::uintptr_t*>(unitArrayPointerAddress) += reinterpret_cast<std::uintptr_t>(-unitArrayBaseAddress) % 120;

More properly, we should reallocate the array to include space for an extra unit record, just to make sure the full array fits in the shifted position. Realistically, we can probably get away with not doing that since we won't ever be using the full unit array in a bug reproducer, and should never access past the first 8 bytes of the last record (which due to alignment and offsetting properties, should still be within the original array).

Adding the above to DllMain, and creating a garage with vehicle cargo in InitProc should allow us to reliably reproduce the bug by saving twice and then viewing the garage contents.

DanRStevens commented 4 years ago

To add details to how the above was determined, here's a bit about the transformations that are done.

To index into the unit array units[unitIndex], the formula is:

&unit = &units[] + unitIndex * sizeof(Unit);

To go the reverse direction, the formula is:

unitIndex = (&unit - &units[]) / sizeof(Unit);

Marking up the above formulas with type information, they become:

(uintptr_t)&unit = (uintptr_t)&units[] + (size_t)unitIndex * (size_t)sizeof(Unit);

(size_t)unitIndex = ((uintptr_t)&unit - (uintptr_t)&units[]) / (size_t)sizeof(Unit);

In practice, things get a little more interesting with the types.

In the first direction, converting a unitIndex to a &unit, the game breaks down the multiplication by the Unit size into several factors: 120 = 8 * 3 * 5 It then performs the multiplication in 3 steps, using a shift by 3 bits (multiply by 8), followed by two LEA (Load Effective Address) instructions to multiply by 3 and by 5. The LEA is a combined shift + add instruction, that is capable of shifting by 1, 2, or 3 bits, hence can multiply by 2, 4, or 8, followed by an add. It's optimized for indexing into arrays of primitive types. With a bit of creative use the LEA instruction can multiply by 3 = 2 + 1 (LEA EDX, [EAX * 2 + EAX]), and by 5 = 4 + 1 (LEA EAX, [EDX * 4 + EDX]). The reason for the breakdown is that the MUL instruction tended to be a bit slow on older architectures, or at least had a high latency. The LEA instruction was much more limited and simpler, and so would run faster.

Now, the shift SHL, and the load effective address LEA are both unsigned operations. That matches with the formulas above quite nicely.

In the reverse direction, the division was done with the IDIV instruction (signed divide, or integer divide). This is in contrast to the DIV instruction (unsigned divide). For non-integral results, both instructions round towards zero. The difference is in the interpretation of the input bits, and hence which direction 0 will be. For the DIV instruction, an always positive result is always rounded down (by truncating). For the IDIV instruction, a negative value must be rounded up (towards 0, also by truncating).

Adjusting for signedness, the address to index formula was actually implemented as:

(size_t)unitIndex = ((intptr_t)&unit - (intptr_t)&units[]) / (size_t)sizeof(Unit);

Now, the garage code used a sentinel value of -2, which was not accounted for in that formula, and so was used directly as if it was a &unit value. Further, due to Windows platform constraints, the &units[] will be in the lower 2GB of the address space, which guarantees the &unit - &units[] value will be negative. This means the IDIV instruction will round upwards when rounding towards 0 by truncation.

ceiling(((intptr_t)&unit - (intptr_t)&units[]) / (size_t)sizeof(Unit))
= ceiling((-2 - &units[]) / 120)
= ((-2 - &units[]) + (-(-2 - &units[]) % 120)) / 120  // Add remainder to obtain multiple of 120 (round up)
= ((-2 - &units[]) + ((2 + &units[]) % 120)) / 120

Using the above, we can calculate the full round trip, from pointer to index, and then back to pointer, the formula is:

ceiling(((intptr_t)&unit - (intptr_t)&units[]) / (size_t)sizeof(Unit)) * 120 + &units[]
= (((-2 - &units[]) + ((2 + &units[]) % 120)) / 120) * 120 + &units[]
= ((-2 - &units[]) + ((2 + &units[]) % 120)) + &units[]
= -2 + ((2 + &units[]) % 120)

Now, the bug occurred when the round trip conversion stored the value 0. Solving the above for 0, we have:

-2 + ((2 + &units[]) % 120) == 0
((2 + &units[]) % 120) == 2
&units[] % 120 == 0

The bug then works as follows: During the first Save/Load conversion cycle, the Unit.next pointer value is converted from (uintptr_t)-2 to corrupted (size_t)index and then back to (uintptr_t)0. (Corruption of the unchecked sentinel pointer value). During the second Save/Load cycle, the unit is initially still considered alive, since the next value does not equal the sentinel value -1. It proceeds with saving the unit. During the conversion, the pointer value (uintptr_t)0 is special cased and converted to the (size_t)-1 index value. Then during the loading phase of the cycle, the (size_t)-1 index value is seen and assumes the unit is now dead, and so skips over conversion and restore of the unit, leaving it in a state that indicates it is now a dead unit.

Brett208 commented 4 years ago

Sorry to be so dense, but where does your suggestion fit into our code:

// Untested
*reinterpret_cast<std::uintptr_t*>(unitArrayPointerAddress) += reinterpret_cast<std::uintptr_t>(-unitArrayBaseAddress) % 120;

void PatchUnitArrayAddress()
{
    std::uintptr_t unitArray = 0x04170030;
    constexpr std::size_t unitArraySize = 0x3c078;

    LPVOID result = VirtualAlloc(reinterpret_cast<LPVOID>(unitArray), unitArraySize, MEM_COMMIT | MEM_RESERVE, PAGE_READWRITE);

    constexpr std::uintptr_t unitArrayPointer = 0x54F848;
    *reinterpret_cast<std::uintptr_t*>(unitArrayPointer) = unitArray;
}

DanRStevens commented 4 years ago

It should replace the last line.

We won't need the VirtualAlloc call. At least not at this time.

We should probably get the bug reproducer uploaded to GitHub. I'm unable to build missions from Linux, so it would be nice to get the project building on a CI server. Might even be a good test project to experiment with CI artifacts and GitHub releases.

Minor point, but for the bug reproducer project, it may make sense to create two separate commits. The first commit could add the InitProc code to create the garage and stored unit. The second commit could update DllMain with the memory hack. I could see it being useful to jump between the two versions easily.

Brett208 commented 4 years ago

So, how do we know what the unitArrayBaseAddress is then?

void PatchUnitArrayAddress()
{
    constexpr std::uintptr_t unitArrayPointerAddress = 0x54F848;
        *reinterpret_cast<std::uintptr_t*>(unitArrayPointerAddress) += reinterpret_cast<std::uintptr_t>(-unitArrayBaseAddress) % 120;
}

DanRStevens commented 4 years ago

Oh, sorry, I was being sloppy. The names are a bit hard to get right. The unitArrayBaseAddress value is the value stored at *unitArrayPointerAddress (with pointer casting). I guess the code should look more like:

void PatchUnitArrayAddress()
{
    constexpr std::uintptr_t unitArrayPointerAddress = 0x54F848;
    auto unitArrayPointer = reinterpret_cast<std::uintptr_t*>(unitArrayPointerAddress);
        *unitArrayPointer += -(*unitArrayPointer) % 120;
}

Don't worry if the details aren't quite right yet. We can push that part off into a branch if needed. Having the core bug reproducer with just the garage and loaded unit would be helpful for now.

OutpostUniverse / op2ext

Create Built In Module to address garage data corruption during saving #204