diasurgical / devilution

Diablo devolved - magic behind the 1996 computer game
Other
8.77k stars 920 forks source link

Produce byte-identical executable using the same compiler #11

Closed mewmew closed 5 years ago

mewmew commented 6 years ago

This issue tracks a very ambitions goal of Devilution, the production of a byte-identical executable to the 1.09b original. To achieve this goal, the exact same compiler has to be used as was used to produce the original executable.

For diablo.exe version 1.09b, this corresponds to Visual C++ 5.10, and for the debug release diablo.exe version 1.00 (1996-12-21) this corresponds to Visual C++ 4.20 compiled in debug mode; as based on PEiD output.

PDiD 109b

PEiD 100 dbg

Edit: reference discussion: https://github.com/galaxyhaxz/devilution/pull/10#issuecomment-396211436

ghost commented 6 years ago

Oh that's great! I totally forgot about PEiD. So now we know for sure it was 5.10. Thankfully VC 5/6 are very similar and still run on modern systems. I went ahead and made a chart for all versions so we have an easy reference. I'll download 5.10 this next week and configure the project accordingly. SDK Versions

fearedbliss commented 6 years ago

I don't know much about this but from efforts in the Linux community (Debian and maybe others) to produce byte identical executables, you will need to remove any type of code that may cause the compiler to generate output that is variable. Things like timestamps used in the compiler and other things will need to be taken into account. Producing byte identical output is primarily being used (from what I understand) for validity purposes. To make sure that the output produce from the original author can be faithfully reproduced by others. Therefore if a downstream source produces a non byte identical output compared to the author's, then either the author has changed or is compromise, or the downstream is.

For this project, given that Blizzard isn't developing it anymore, I don't think this goal should be pursued. You will be limiting your tool chain and since your resources are already limited, the project will be hindered. You should consider just continue your improvements and stabilization efforts.

On June 11, 2018 7:27:16 AM EDT, Robin Eklind notifications@github.com wrote:

This issue tracks a very ambitions goal of Devilution, the production of a byte-identical executable to the 1.09b original. To achieve this goal, the exact same compiler has to be used as was used to produce the original executable.

For diablo.exe version 1.09b, this corresponds to Visual C++ 5.10, and for the debug release diablo.exe version 1.00 (1996-12-21) this corresponds to Visual C++ 4.20 compiled in debug mode; as based on PEiD output.

PDiD
109b

PEiD 100
dbg

-- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/galaxyhaxz/devilution/issues/11

-- Sent from my Android device with K-9 Mail. Please excuse my brevity.

fearedbliss commented 6 years ago

Also what I mean by byte identical is that at the end of every build, the checksums (hash) for the executable is identical to the original. Since Blizzard didn't have this byte identical goal, it may well be you will never get the same haha as the one used in the original game, but you may be able to get the hashes to be the same within your own multiple compilations of the same commit (reproducible builds is something you may want to look into from Debian's efforts for this).

On June 11, 2018 7:27:16 AM EDT, Robin Eklind notifications@github.com wrote:

This issue tracks a very ambitions goal of Devilution, the production of a byte-identical executable to the 1.09b original. To achieve this goal, the exact same compiler has to be used as was used to produce the original executable.

For diablo.exe version 1.09b, this corresponds to Visual C++ 5.10, and for the debug release diablo.exe version 1.00 (1996-12-21) this corresponds to Visual C++ 4.20 compiled in debug mode; as based on PEiD output.

PDiD
109b

PEiD 100
dbg

-- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/galaxyhaxz/devilution/issues/11

-- Sent from my Android device with K-9 Mail. Please excuse my brevity.

mewmew commented 6 years ago

I don't know much about this but from efforts in the Linux community (Debian and maybe others) to produce byte identical executables, you will need to remove any type of code that may cause the compiler to generate output that is variable. Things like timestamps used in the compiler and other things will need to be taken into account

Indeed. Timestamps would have to be removed and other parts that are variable from build to build as well. Good thing is those are easily identified.

So one approach would be to simply extract the contents of the .text section and dump that, then compare between the binaries. Of course, initially the output will be very different, and thus hashes can only be used as a final measurement. But before then we can use something like Hamming distance to get a score for how many edits have been made.

We can also keep track of which offsets correspond to which function, and in this way check off one function at the time.

Note, relative offsets to addresses will be variable if the output binary rearranges where code and data is stored, and also depending on the size of these parts.

It most definitely won't be an easy challenge. But, at least to me, that's what makes it fun!

Also, this is not goal number one. That would be to fix crashes, improve stability, fix builds across the main platforms (Windows, Linux, Mac), etc. Rather, this can be thought of as an aspirational goal that Devilution may one day achieve, or perhaps more likely not. But I for one would definitely want to be part of making it happen!

Cheers, /u

ghost commented 6 years ago

I personally don't a byte for byte copy is possible.

You would literally have the exact code any extra variables or optimizations or anything would completely throw it off.

Mewmew , I respect your energy )

janisozaur commented 6 years ago

There are already projects like https://github.com/pret/pokeruby or https://github.com/MimicYou/pokeredbeta that build hash-perfect recreations of their originals.

mewmew commented 6 years ago

There are already projects like https://github.com/pret/pokeruby or https://github.com/MimicYou/pokeredbeta that build hash-perfect recreations of their originals.

Wow, that is really cool! Thanks for pointing out these projects.

gp-alex commented 6 years ago

Hey, Diablo is already decompiled and refactored, the project is called The Hell and source code was hosted on Assembla at least in 2013

mewmew commented 6 years ago

Hey, Diablo is already decompiled and refactored, the project is called The Hell and source code was hosted on Assembla at least in 2013

@gp-alex That's great! Do you know where this source code is hosted?

ghost commented 6 years ago

You are correct, Hellfire was decompiled as early as 2006 IIRC, and The Hell released their sources a few years ago at the Khandurus network. However, it seems to have completely disappeared from the internet, same with The Dark/Khandurus.

The Hell 2 creator still has a copy. He reached out to me a few days ago and said he might pop in a make a contribution or two.

ghost commented 6 years ago

Just make sure this stays as original as possible. I have seen The Hell mod and I thought it was grindingly unbalanced and a weird distortion of what Diablo is.

mewmew commented 6 years ago

Just make sure this stays as original as possible. I have seen The Hell mod and I thought it was grindingly unbalanced and a weird distortion of what Diablo is.

No worries, this is tracked by #11.

AJenbo commented 6 years ago

What compiler was used to generate the exe in the 0.2 release? I patch between it and Diablo.exe seams to indicate a 60% correlation.

ghost commented 6 years ago

I used VC++ 5.10 for all release builds. The GNU makefiles currently don't properly add the Icon and resource files.

mewmew commented 6 years ago

The GNU makefiles currently don't properly add the Icon and resource files.

Added an issue to track this #48.

seritools commented 6 years ago

See this comment for a major update!

ChaosMarc commented 5 years ago

I think this issue can be closed. It's topic should be exactly the same as #111 which is newer.

PS: If I should stop looking for these housekeeping tasks, please tell me ;)

mewmew commented 5 years ago

PS: If I should stop looking for these housekeeping tasks, please tell me ;)

It's great you are looking for housekeeping tasks :) Please continue.

As for this issue, it is similar but slightly different from #111. For instance, there are other issues to consider even when we have exactly the same compiler, such as time stamps being included in the binary. This issue mentions some of those aspects. However, I'd be fine with closing this issue as the primary work now is to get bin perfect assembly, which is tracked by #111 and the bin exact milestone. Later on when we want to figure out specific issues, such as how to handle time stamps, we can open new dedicated issues for those.

Closing for now. We can re-open at a later time, should we feel like.