diasurgical / devilution

Diablo devolved - magic behind the 1996 computer game
Other
8.79k stars 920 forks source link

Figure out the exact set of compiler flags used to produce Diablo.exe (1.09b) #111

Open mewmew opened 6 years ago

mewmew commented 6 years ago

This issue tracks the effort to figure out the complier flags used to produce Diablo.exe (version 1.09b). It may be considered a subtask of #11, as given information about compiler flags, we can ensure that the same input source code produce the same output object code.

From what I can tell, it seems like /O1 is used rather than /O2. This is based on the padding between functions.

/O2 produces (given that https://github.com/diasurgical/devilution/pull/110 has been merged):

  401000:   e9 0b 00 00 00          jmp    0x401010
  401005:   90                      nop
  401006:   90                      nop
  401007:   90                      nop
  401008:   90                      nop
  401009:   90                      nop
  40100a:   90                      nop
  40100b:   90                      nop
  40100c:   90                      nop
  40100d:   90                      nop
  40100e:   90                      nop
  40100f:   90                      nop
  401010:   a1 28 b1 40 00          mov    eax,ds:0x40b128
  401015:   a3 84 1a 41 00          mov    ds:0x411a84,eax
  40101a:   c3                      ret    
  40101b:   90                      nop
  40101c:   90                      nop
  40101d:   90                      nop
  40101e:   90                      nop
  40101f:   90                      nop
  401020:   8b 44 24 04             mov    eax,DWORD PTR [esp+0x4]
  401024:   85 c0                   test   eax,eax
  401026:   74 0f                   je     0x401037
  401028:   6a 00                   push   0x0
  40102a:   6a ff                   push   0xffffffff
  40102c:   68 40 d0 40 00          push   0x40d040
  401031:   50                      push   eax
  401032:   e8 69 0c 00 00          call   0x401ca0
  401037:   c3                      ret    

While /O1 produces:

  401000:   e9 00 00 00 00          jmp    0x401005
  401005:   a1 28 c1 40 00          mov    eax,ds:0x40c128
  40100a:   a3 84 2a 41 00          mov    ds:0x412a84,eax
  40100f:   c3                      ret    
  401010:   83 7c 24 04 00          cmp    DWORD PTR [esp+0x4],0x0
  401015:   74 12                   je     0x401029
  401017:   6a 00                   push   0x0
  401019:   6a ff                   push   0xffffffff
  40101b:   68 40 e0 40 00          push   0x40e040
  401020:   ff 74 24 10             push   DWORD PTR [esp+0x10]
  401024:   e8 8f 09 00 00          call   0x4019b8
  401029:   c3                      ret    

This is to be compared to the original version of Diablo.exe (1.09b):

  401000:   e9 00 00 00 00          jmp    0x401005
  401005:   a1 00 94 47 00          mov    eax,ds:0x479400
  40100a:   a3 30 79 4b 00          mov    ds:0x4b7930,eax
  40100f:   c3                      ret    
  401010:   83 7c 24 04 00          cmp    DWORD PTR [esp+0x4],0x0
  401015:   74 12                   je     0x401029
  401017:   6a 00                   push   0x0
  401019:   6a ff                   push   0xffffffff
  40101b:   68 b0 30 48 00          push   0x4830b0
  401020:   ff 74 24 10             push   DWORD PTR [esp+0x10]
  401024:   e8 8d 87 06 00          call   0x4697b6
  401029:   c3                      ret    
heinermann commented 6 years ago

It's worth noting that individual compilation units can have different optimization settings (just to keep in mind).

Another comparison with Diablo.exe 1.09b:

I actually want to highlight in particular the line and byte ptr [ecx], 0. This is actually an assignment, but it was optimized into an and, which is only produced under O1.

Under O1:

Under O2:

ghost commented 6 years ago

The project now uses /O1 for both debug and retail builds. I figured that's what was used, since the output size is much closer (760 KB). The main reason I never changed it was in the early days optimizations broke the code entirely. It took a few months of fixing up before it was stable enough to change it.

heinermann commented 6 years ago

You may also be able to find out other compiler flags by looking at the binary header (MZ/PE) and seeing what differs. Imports too probably.

ghost commented 6 years ago

Good call, I noticed the original binary has about 2x as many import descriptors. Which is weird, since it there's only a few differences in the KERNEL32 imports. We'll have to look at that sometime. Also, the release build seems to scramble the code a little more, where as the debug build is closer to the original. Setting aside debug sections of course.

It could also be a good idea to "trace" the developers steps. We know the game was originally written in VC 4.20, but then upgraded to 5.10. So perhaps generating the project using 4.20's default settings could get us closer.

mewmew commented 6 years ago

So perhaps generating the project using 4.20's default settings could get us closer

Sounds reasonable. Report back the compiler flags used :) I still don't have a Windows box to play with.

seritools commented 6 years ago

Warning: Big writeup ahead! :)

As @mewmew wrote up over here, more often than not, the compiler we currently use (5.0 SP3) generated slightly different instructions than the original 1.09b binary.

The first thing I checked is the linker versions. For that I grabbed every version ever, basically, and confirmed that the linker used must be from 5.0 SP3:

SP0: 5.00.7022
SP1: 5.02.7132
SP2: 5.02.7132
SP3: 5.10.7303

Diablo 1.09b PE Header: 5.10

While browsing for compiler option info, I stumbled over the #pragma optimize directive, allowing to toggle optimization settings on a per-function basis. Playing around with disabling size-based code and other optimizations, the generated instructions were even further away from the original, so I knew that was probably not what was happening.

For some reason, the compiler used by Blizzard was just more "clever" than the one we use...

After going off-track by checking the #pragma optimize directive, I went back to linker stuff. Just to make sure, I checked VS6's link.exe (starting from SP6, for some reason).

The output was:

C:\Program Files\Microsoft Visual Studio\VC98\Bin>link /?
Microsoft (R) Incremental Linker Version 6.00.8447
Copyright (C) Microsoft Corp 1992-1998. All rights reserved.

usage: LINK [options] [files] [@commandfile]

   options:

      /ALIGN:#
      /BASE:{address|@filename,key}
[...]
      /LINK50COMPAT

Looking at that options list it instantly clicked. See that /LINK50COMPAT? Googling it revealed that link.exe as well as lib.exe both have that flag in version 6. It seems that with VC++ 6, they changed the structure of lib files, so the lib files generated by lib.exe 5.x and lower are not compatible with link.exe 6.x and vice versa. Microsoft implemented that switch so that older library files can be used with VC++ 6, and code compiled by VC++ 6 can generate lib files compatible with the linker of VC++ 5.

So I just fired up VS6SP6, upgraded the project file, compiled, checked in IDA, and voilà: image (left is Diablo.exe 1.09b, right is devilution compiled/linked with VC++ 6 SP6)

The instructions are identical (at least for the function from PR #135)!

So, what I guess happened:

The 1.09b .exe has a creation date of 2001-05-12. This means that the newest VC++6 version they could have used it Service Pack 5 (released April 2001). Service Pack 4 (released January 2001) would be a good contender as well.

Anyways, that was my journey today. :)

What's left to do:

In retrospective, all of this makes sense seeing that Storm.dll and the .SNP files are even linked with v6 of the linker! 🙂

StephenCWills commented 6 years ago

That's way cool! Excellent work.

fearedbliss commented 6 years ago

@seritools Wow that's a great read and find, the asm is exactly the same (excluding different memory locations)!

ghost commented 6 years ago

Beautiful, now we just have to figure out how they automated everything :wink: It seems tedious to have used two different toolchains, so maybe there's another option we're missing or they modded their VC6 installation.

Awhile back I documented the versions they used for D2:

Version          Released   Patch
MSVC++ 6.0 SP 3  5/21/1999  1.00-1.03
MSVC++ 6.0 SP 4  6/15/2000  1.04-1.05
MSVC++ 6.0 SP 5  2/26/2001  1.06-1.10
MSVC++ 6.0 SP 6  3/29/2004  N/A
VS .NET 2003     ?/??/2003  1.11+

The linker for both SP 4 and 5 generates code exactly the same, I didn't test SP 6 since it was made long after those patches. We can assume that the D2 devs used their current setup to compile D1 at the time, so 1.09b would've been compiled with SP 5 and 1.08 SP 3.

mewmew commented 6 years ago

@seritools Wow Dennis, that's great! Hats off. Very happy that you managed to pin this down.

ghost commented 6 years ago

Do we get similar sizes with the optimization? I am curious what a bindiff would result in. It might be useless to know, but I think that it would tell us how close we are.

seritools commented 6 years ago

https://github.com/seritools/devilution/commit/6f5f573bbd520db8dcda36db8bb52525b9bcd7f7 (please excuse my crappy makefile skills)

Managed to link the exe with 5.10 while compiling with 6 :) However in this constellation, PDBs cannot be generated, since the linker exits mentioning that the pdb format is not supported. A release build is possible though, and reports the correct linker version.

For working at getting the code accurate I'd recommend just linking with v6 for now (as long as we don't see differences too big to tolerate). Generating release builds with full PDB info makes it a lot easier imho.

What do you think?

seritools commented 6 years ago

A few more observations:

@galaxyhaxz has noted that the VC6 linker starts the program code at 0x1000, while the VC5 linker starts at 0x400, which is a difference of 3072 bytes, so the resulting binary could be down to 741KB already!

I'm going to check SP2, SP1, SP0 next and try linking with VC5 instead to see if I can get that simple function to generate correctly and maybe generate a perfectly-sized binary :O

EDIT:

Checked the exact byte counts:

1.09: 757760 bytes

SP0: 761856 bytes (diff. 4096 bytes, at 3072 of which are because I currently link with VC6) SP1: 761856 bytes SP3: 761856 bytes

seritools commented 6 years ago

Alright, SP2, SP4, SP5 are excluded now, since all three of those generate binaries that are too big.

For some reason, with SP3, the filesize went down again. I reinstalled SP3 just to make sure, but it stayed the same.

So for now, VC6 SP0, SP1 or SP3 in conjunction with the linker from VC5 SP3 is the closest we can get.

Copied over the SP3 installation to my buildtools folder, then built it while linking with VC5SP3... well, I accidentally beat Blizzard 😆 image

So, what I gathered:

ghost commented 6 years ago

Awesome work!

I think I might try this, because I am not thinking the pdb info to be too useful at this point considering we literally can cross reference the binary. 1024 bytes are still a lot though, but it could be worse.

I am curious if we can get closer with pdb info .

Also @seritools for that binary that is really close, can you upload it ? I would like to do a bindiff on it.

seritools commented 6 years ago

PDB info is nice while comparing functions, since you can just jump to the function in question in the new binary, without remembering adresses, and getting all the type info that we already have for the original binary as well. (Generating with pdb info results in a much bigger binary btw)

Since the binaries are rounded up to the next 1024bytes, maybe the original binary was just 739k + a few bytes? Every instruction we missed could be the one dropping us down to 439KB. :)

Like there was/is (already reported to @galaxyhaxz) a missing function NewCursor in devilution, which did nothing but call to SetCursor in the original binary. A couple of bytes "saved". Also every differing optimization contributes to that.

I'm gonna be at my pc tomorrow and will upload the binary then.

seritools commented 6 years ago

And here is the exe :) devilution.zip

ghost commented 6 years ago

Comparing the size with @seritools binary: size

                            .text    .rdata    .data    .rsrc        Total
Size change in Devilution:  +0x2400  -0x7600   +0x4E00  Identical    -0x400

So the Devilution binary actually has 0x2400 (9kb) more in the code section than the vanilla. This is surely due to many decompiled functions having excessive variables. Interesting though is that most data is stored in .data instead of .rdata. This is likely because many of these variables were declared as static. Devilution has a total of 0x2800 bytes less in the data section. I noticed the compiled binary has a much smaller CRT section than compiled plainly with VC5, which could attribute to that. It could also be that the compiler is discarding certain global variables or data that isn't used.

seritools commented 6 years ago

Yeah, I only got the size way down when using the VC6 libs, the file was quite a bit bigger with the VC5 ones. (Tbh it was surpriting to me that the VC6 libs were working at all, since it supposedly has a different lib format, but maybe that's just not the case for the standard libs)

nomdenom commented 6 years ago

.rdata is bigger in the original binary because the decompiled code is missing const annotations on the various data tables (arrays). const gets placed in a separate section.

ghost commented 6 years ago

See issue #13. Many functions and data are declared as static but are missing that. It seems you're right that const was probably used as well.

mewmew commented 6 years ago

At least information about static is present in the debug info from the SYM files; e.g. https://github.com/sanctuary/psx/blob/master/easy-as-pie/globals/global_0.cpp#L4522

seritools commented 6 years ago

https://github.com/diasurgical/devil-nightly/issues/15

We're pretty much done 🎉

In the end, Blizzard made sure to keep their software up to date it seems 😄

seritools commented 6 years ago

We can probably be 100% sure now :) Found this by change while browsing reddit:

http://bytepointer.com/articles/the_microsoft_rich_header.htm

image

AJenbo commented 6 years ago

Woha that is exiting :D

But does that also say that the linker was VC6 SP3? (probably wasn't updated in the later versions)

mewmew commented 6 years ago

We can probably be 100% sure now :) Found this by change while browsing reddit:

That is such a great find! Wow, what a buried treasure!!

nomdenom commented 6 years ago

Funny, I actually knew about the Rich header but had forgotten about it (my first written reference to it is from 2007)... well, good ol' brute-force search and a bit of luck still got us there :muscle:

ghost commented 6 years ago

@AJenbo Linker was the same from SP3-SP6. However SP4/SP5 can generate different code if the Processor Pack is installed.

@seritools Once again, a great discovery! Oh, it's so great to be working with code from 2001. Could you imagine if Diablo 1 was released now with all those micro-optimizations and stripping the Rich header+file names? :P

After comparing the Rich header with 3 different tools, below is a list documenting each value in the header. Interesting is that "Utc12_2_C" is a C compiler and not C++. So it's possible all the files were treated as C even with .CPP extensions. Also, there doesn't appear to be any asm/MASM references, which I guess implies render.cpp was indeed a .CPP file with inlined calls.

Id  Build  Count  Name       Description
 0      0    155  Unknown    [---] Number of imported functions (old)
 1      0    229  Import0    [---] Number of imported functions
 6   1668      1  Cvtres500  [RES] VS97 (5.0) SP3 cvtres 5.00.1668
 2   7303     29  Linker510  [IMP] VS97 (5.0) SP3 link 5.10.7303
 3   7303      1  Cvtomf510        VS97 (5.0) SP3 cvtomf 5.10.7303
 4   8447      2  Linker600  [LNK] VC++ 6.0 SP3,SP4,SP5,SP6 link 6.00.8447
48   9044     72  Utc12_2_C  [---] VC++ 6.0 SP5 Processor Pack
19   9049     12  Linker512        Microsoft LINK 5.12.9049
seritools commented 6 years ago

So it's possible all the files were treated as C even with .CPP extensions.

That was one of my very first assumptions about the project haha

squidcc commented 6 years ago

That was one of my very first assumptions about the project haha

So apparently you want evidence for something that you've known all along?

mewmew commented 6 years ago

@squidcc Please, I would very much appreciate if you stopped indirectly accusing people of making faults. We all do our best to contribute on a hobby project, and the atmosphere should be fun and collaborative. I certainly appreciate the technical depth and experience you bring to the project. At the same time, I would like to ask you if you could change the tone of your correspondence. Some of the discussions happen in the Discord chat, as we mostly use issues to track issues and progress. Many misunderstanding and misinterpretations can be cleared out more effectively in the chat, and you are very welcome to join. Send a ping to either me, @seritools or @AJenbo if you'd like an invite link for the Discord chat.

Kindly, Robin

AJenbo commented 6 years ago

Small update, @mewmew and I spend some time this weekend testing things out and by adding the /TC flag to the compiler we where able to get the RICH header to reflect Utc12_2_C instead of Utc12_2_CPP.

As /Tc did not appear to change thing we're fairly sure that the whole project needs to be written in C instead of C++.

I have started gradually rewriting the parts the aren't C compliant, there are a few thing that I don't know exactly how to deal with, but I hope to get it down to just a few issues. First PR is here https://github.com/diasurgical/devilution/pull/465 (converting player.cpp), I expect subsequent PR's to be much smaller since a lot of the general stuff was handled in this one.

AJenbo commented 5 years ago

As a quick update @mewmew has figured out most of the pieces needed to produce a correct rich header. Also /Tc does work and we now have 54% of the projects files compiling as C :)

For details on what issues are blocking the remaning code from being compiled as C, can be seen here: https://github.com/diasurgical/devilution/issues/528