Open mewmew opened 6 years ago
It's worth noting that individual compilation units can have different optimization settings (just to keep in mind).
Another comparison with Diablo.exe 1.09b:
I actually want to highlight in particular the line and byte ptr [ecx], 0
. This is actually an assignment, but it was optimized into an and
, which is only produced under O1.
Under O1:
*ptr = 0
becomes and byte ptr [ecx], 0
, even though assignment was used*ptr &= 0
becomes and byte ptr [ecx], 0
Under O2:
*ptr = 0
becomes mov byte ptr [eax], 0
*ptr &= 0
becomes mov byte ptr [eax], 0
, even though AND was usedThe project now uses /O1 for both debug and retail builds. I figured that's what was used, since the output size is much closer (760 KB). The main reason I never changed it was in the early days optimizations broke the code entirely. It took a few months of fixing up before it was stable enough to change it.
You may also be able to find out other compiler flags by looking at the binary header (MZ/PE) and seeing what differs. Imports too probably.
Good call, I noticed the original binary has about 2x as many import descriptors. Which is weird, since it there's only a few differences in the KERNEL32 imports. We'll have to look at that sometime. Also, the release build seems to scramble the code a little more, where as the debug build is closer to the original. Setting aside debug sections of course.
It could also be a good idea to "trace" the developers steps. We know the game was originally written in VC 4.20, but then upgraded to 5.10. So perhaps generating the project using 4.20's default settings could get us closer.
So perhaps generating the project using 4.20's default settings could get us closer
Sounds reasonable. Report back the compiler flags used :) I still don't have a Windows box to play with.
Warning: Big writeup ahead! :)
As @mewmew wrote up over here, more often than not, the compiler we currently use (5.0 SP3) generated slightly different instructions than the original 1.09b binary.
The first thing I checked is the linker versions. For that I grabbed every version ever, basically, and confirmed that the linker used must be from 5.0 SP3:
SP0: 5.00.7022
SP1: 5.02.7132
SP2: 5.02.7132
SP3: 5.10.7303
Diablo 1.09b PE Header: 5.10
While browsing for compiler option info, I stumbled over the #pragma optimize
directive, allowing to toggle optimization settings on a per-function basis. Playing around with disabling size-based code and other optimizations, the generated instructions were even further away from the original, so I knew that was probably not what was happening.
For some reason, the compiler used by Blizzard was just more "clever" than the one we use...
After going off-track by checking the #pragma optimize
directive, I went back to linker stuff. Just to make sure, I checked VS6's link.exe
(starting from SP6, for some reason).
The output was:
C:\Program Files\Microsoft Visual Studio\VC98\Bin>link /?
Microsoft (R) Incremental Linker Version 6.00.8447
Copyright (C) Microsoft Corp 1992-1998. All rights reserved.
usage: LINK [options] [files] [@commandfile]
options:
/ALIGN:#
/BASE:{address|@filename,key}
[...]
/LINK50COMPAT
Looking at that options list it instantly clicked. See that /LINK50COMPAT
? Googling it revealed that link.exe
as well as lib.exe
both have that flag in version 6. It seems that with VC++ 6, they changed the structure of lib files, so the lib files generated by lib.exe
5.x and lower are not compatible with link.exe
6.x and vice versa. Microsoft implemented that switch so that older library files can be used with VC++ 6, and code compiled by VC++ 6 can generate lib files compatible with the linker of VC++ 5.
So I just fired up VS6SP6, upgraded the project file, compiled, checked in IDA, and voilà : (left is Diablo.exe 1.09b, right is devilution compiled/linked with VC++ 6 SP6)
The instructions are identical (at least for the function from PR #135)!
So, what I guess happened:
.lib
files from the .obj
files via VC++ 6's lib.exe
with the /LINK50COMPAT
, generating library files compatible with VC++5's linker/O1
:DThe 1.09b .exe has a creation date of 2001-05-12. This means that the newest VC++6 version they could have used it Service Pack 5 (released April 2001). Service Pack 4 (released January 2001) would be a good contender as well.
Anyways, that was my journey today. :)
What's left to do:
link.exe
v5.10, and call that linker to generate the output file.In retrospective, all of this makes sense seeing that Storm.dll
and the .SNP
files are even linked with v6 of the linker! 🙂
That's way cool! Excellent work.
@seritools Wow that's a great read and find, the asm is exactly the same (excluding different memory locations)!
Beautiful, now we just have to figure out how they automated everything :wink: It seems tedious to have used two different toolchains, so maybe there's another option we're missing or they modded their VC6 installation.
Awhile back I documented the versions they used for D2:
Version Released Patch
MSVC++ 6.0 SP 3 5/21/1999 1.00-1.03
MSVC++ 6.0 SP 4 6/15/2000 1.04-1.05
MSVC++ 6.0 SP 5 2/26/2001 1.06-1.10
MSVC++ 6.0 SP 6 3/29/2004 N/A
VS .NET 2003 ?/??/2003 1.11+
The linker for both SP 4 and 5 generates code exactly the same, I didn't test SP 6 since it was made long after those patches. We can assume that the D2 devs used their current setup to compile D1 at the time, so 1.09b
would've been compiled with SP 5
and 1.08 SP 3
.
@seritools Wow Dennis, that's great! Hats off. Very happy that you managed to pin this down.
Do we get similar sizes with the optimization? I am curious what a bindiff would result in. It might be useless to know, but I think that it would tell us how close we are.
https://github.com/seritools/devilution/commit/6f5f573bbd520db8dcda36db8bb52525b9bcd7f7 (please excuse my crappy makefile skills)
Managed to link the exe with 5.10 while compiling with 6 :) However in this constellation, PDBs cannot be generated, since the linker exits mentioning that the pdb format is not supported. A release build is possible though, and reports the correct linker version.
For working at getting the code accurate I'd recommend just linking with v6 for now (as long as we don't see differences too big to tolerate). Generating release builds with full PDB info makes it a lot easier imho.
What do you think?
A few more observations:
CheckSpell
(.text:0045759A jz short loc_4575A0
) is generated as a jnz, with a slightly different jump pattern, and a few other slight differences in the function beginning@galaxyhaxz has noted that the VC6 linker starts the program code at 0x1000, while the VC5 linker starts at 0x400, which is a difference of 3072 bytes, so the resulting binary could be down to 741KB already!
I'm going to check SP2, SP1, SP0 next and try linking with VC5 instead to see if I can get that simple function to generate correctly and maybe generate a perfectly-sized binary :O
EDIT:
Checked the exact byte counts:
1.09: 757760 bytes
SP0: 761856 bytes (diff. 4096 bytes, at 3072 of which are because I currently link with VC6) SP1: 761856 bytes SP3: 761856 bytes
Alright, SP2, SP4, SP5 are excluded now, since all three of those generate binaries that are too big.
For some reason, with SP3, the filesize went down again. I reinstalled SP3 just to make sure, but it stayed the same.
So for now, VC6 SP0, SP1 or SP3 in conjunction with the linker from VC5 SP3 is the closest we can get.
Copied over the SP3 installation to my buildtools folder, then built it while linking with VC5SP3... well, I accidentally beat Blizzard 😆
So, what I gathered:
Awesome work!
I think I might try this, because I am not thinking the pdb info to be too useful at this point considering we literally can cross reference the binary. 1024 bytes are still a lot though, but it could be worse.
I am curious if we can get closer with pdb info .
Also @seritools for that binary that is really close, can you upload it ? I would like to do a bindiff on it.
PDB info is nice while comparing functions, since you can just jump to the function in question in the new binary, without remembering adresses, and getting all the type info that we already have for the original binary as well. (Generating with pdb info results in a much bigger binary btw)
Since the binaries are rounded up to the next 1024bytes, maybe the original binary was just 739k + a few bytes? Every instruction we missed could be the one dropping us down to 439KB. :)
Like there was/is (already reported to @galaxyhaxz) a missing function NewCursor in devilution, which did nothing but call to SetCursor in the original binary. A couple of bytes "saved". Also every differing optimization contributes to that.
I'm gonna be at my pc tomorrow and will upload the binary then.
And here is the exe :) devilution.zip
Comparing the size with @seritools binary:
.text .rdata .data .rsrc Total
Size change in Devilution: +0x2400 -0x7600 +0x4E00 Identical -0x400
So the Devilution binary actually has 0x2400 (9kb) more in the code section than the vanilla. This is surely due to many decompiled functions having excessive variables. Interesting though is that most data is stored in .data
instead of .rdata
. This is likely because many of these variables were declared as static. Devilution has a total of 0x2800
bytes less in the data section. I noticed the compiled binary has a much smaller CRT section than compiled plainly with VC5, which could attribute to that. It could also be that the compiler is discarding certain global variables or data that isn't used.
Yeah, I only got the size way down when using the VC6 libs, the file was quite a bit bigger with the VC5 ones. (Tbh it was surpriting to me that the VC6 libs were working at all, since it supposedly has a different lib format, but maybe that's just not the case for the standard libs)
.rdata
is bigger in the original binary because the decompiled code is missing const
annotations on the various data tables (arrays). const
gets placed in a separate section.
See issue #13. Many functions and data are declared as static
but are missing that. It seems you're right that const
was probably used as well.
At least information about static
is present in the debug info from the SYM files; e.g. https://github.com/sanctuary/psx/blob/master/easy-as-pie/globals/global_0.cpp#L4522
https://github.com/diasurgical/devil-nightly/issues/15
We're pretty much done 🎉
In the end, Blizzard made sure to keep their software up to date it seems 😄
We can probably be 100% sure now :) Found this by change while browsing reddit:
http://bytepointer.com/articles/the_microsoft_rich_header.htm
Woha that is exiting :D
But does that also say that the linker was VC6 SP3? (probably wasn't updated in the later versions)
We can probably be 100% sure now :) Found this by change while browsing reddit:
That is such a great find! Wow, what a buried treasure!!
Funny, I actually knew about the Rich
header but had forgotten about it (my first written reference to it is from 2007)... well, good ol' brute-force search and a bit of luck still got us there :muscle:
@AJenbo Linker was the same from SP3-SP6. However SP4/SP5 can generate different code if the Processor Pack is installed.
@seritools Once again, a great discovery! Oh, it's so great to be working with code from 2001. Could you imagine if Diablo 1 was released now with all those micro-optimizations and stripping the Rich header+file names? :P
After comparing the Rich header with 3 different tools, below is a list documenting each value in the header. Interesting is that "Utc12_2_C" is a C compiler and not C++. So it's possible all the files were treated as C even with .CPP extensions. Also, there doesn't appear to be any asm/MASM references, which I guess implies render.cpp
was indeed a .CPP file with inlined calls.
Id Build Count Name Description
0 0 155 Unknown [---] Number of imported functions (old)
1 0 229 Import0 [---] Number of imported functions
6 1668 1 Cvtres500 [RES] VS97 (5.0) SP3 cvtres 5.00.1668
2 7303 29 Linker510 [IMP] VS97 (5.0) SP3 link 5.10.7303
3 7303 1 Cvtomf510 VS97 (5.0) SP3 cvtomf 5.10.7303
4 8447 2 Linker600 [LNK] VC++ 6.0 SP3,SP4,SP5,SP6 link 6.00.8447
48 9044 72 Utc12_2_C [---] VC++ 6.0 SP5 Processor Pack
19 9049 12 Linker512 Microsoft LINK 5.12.9049
So it's possible all the files were treated as C even with .CPP extensions.
That was one of my very first assumptions about the project haha
That was one of my very first assumptions about the project haha
So apparently you want evidence for something that you've known all along?
@squidcc Please, I would very much appreciate if you stopped indirectly accusing people of making faults. We all do our best to contribute on a hobby project, and the atmosphere should be fun and collaborative. I certainly appreciate the technical depth and experience you bring to the project. At the same time, I would like to ask you if you could change the tone of your correspondence. Some of the discussions happen in the Discord chat, as we mostly use issues to track issues and progress. Many misunderstanding and misinterpretations can be cleared out more effectively in the chat, and you are very welcome to join. Send a ping to either me, @seritools or @AJenbo if you'd like an invite link for the Discord chat.
Kindly, Robin
Small update, @mewmew and I spend some time this weekend testing things out and by adding the /TC flag to the compiler we where able to get the RICH header to reflect Utc12_2_C instead of Utc12_2_CPP.
As /Tc did not appear to change thing we're fairly sure that the whole project needs to be written in C instead of C++.
I have started gradually rewriting the parts the aren't C compliant, there are a few thing that I don't know exactly how to deal with, but I hope to get it down to just a few issues. First PR is here https://github.com/diasurgical/devilution/pull/465 (converting player.cpp), I expect subsequent PR's to be much smaller since a lot of the general stuff was handled in this one.
As a quick update @mewmew has figured out most of the pieces needed to produce a correct rich header. Also /Tc does work and we now have 54% of the projects files compiling as C :)
For details on what issues are blocking the remaning code from being compiled as C, can be seen here: https://github.com/diasurgical/devilution/issues/528
This issue tracks the effort to figure out the complier flags used to produce Diablo.exe (version 1.09b). It may be considered a subtask of #11, as given information about compiler flags, we can ensure that the same input source code produce the same output object code.
From what I can tell, it seems like
/O1
is used rather than/O2
. This is based on the padding between functions./O2
produces (given that https://github.com/diasurgical/devilution/pull/110 has been merged):While
/O1
produces:This is to be compared to the original version of Diablo.exe (1.09b):