Washi1337 / AsmResolver

A library for creating, reading and editing PE files and .NET modules.
https://docs.washi.dev/asmresolver/
MIT License
826 stars 125 forks source link

Reading entry assembly in AppHost / SingleFileHost #519

Open caesay opened 5 months ago

caesay commented 5 months ago

Problem Description

Epic library!

When given a random EXE file, it's currently cumbersome to locate the dotnet entry assembly.

Proposal

Ideally, there would be an API to read the placeholder / relative entry dll path, so given an EXE we could locate the entry dotnet assembly.

Furthermore, if there was an API which could "detect" the entry assembly given an arbitrary EXE, that would also be neat - but some documentation describing how to do this yourself would also be good as an alternative.

Alternatives

No response

Additional Context

No response

Washi1337 commented 5 months ago

I am not sure this is possible without applying some heuristics.

The biggest issue is that the offset the application binary path is stored in bundle files is not so well-defined. The reference implementation of the bundler by Microsoft also just finds the right place simply by searching for a known placeholder in a template file. Unfortunately, this placeholder is (as its name implies) replaced with the final entry assembly at compile-time, effectively destroying all information we can use to infer it automatically.

Generally speaking I am hesitant to adding heuristics to AsmResolver (especially when it involves disassembling and interpreting code or similar) unless it is very frequently used and reliable. However, I am open to suggestions.

caesay commented 5 months ago

I'm aware of the difficulty, but I suspected you knew something I didn't - because the WriteUsingTemplate function can replace the path in an already-written apphost where the placeholder text is no longer present, using an offset from the signature.

If not that, I suppose we could build a dictionary of apphost file hash and offsets. It would be trivial to scrape all of the app hosts from NuGet (eg. https://www.nuget.org/packages/Microsoft.NETCore.App.Host.win-x86) and record the placeholder offset for each unique file.

Washi1337 commented 5 months ago

The WriteUsingTemplate method combined with BundlerParameters.FromExistingBundle uses the original main file path itself (padded with zeroes up to the original length of the placeholder) instead of the standard placeholder as heuristic, and strips all EOF/overlay data to replace it with the new bundle manifest (see BundleManifest.cs:400-409). This is fast and reliable for standard apphost/singlefilehost files, but definitely not perfect (hence the warning in the docs).

Maintaining a dictionary of well-known template file-offset pairs is also not super trivial, because what are the keys of those dictionaries going to be? Raw hashes of the files I don't think will work because the existing files will have their placeholders replaced, and for windows binaries with have their own win32 resources.

caesay commented 5 months ago

All good points. I don't have any other suggestions, other than: All the Win32 resources are written to the entry DLL, and then copied to the final exe. As long as that's how the apphost is built, the "OriginalFilename" resource will be the name of the entry DLL. It's certainly not foolproof but it's better than guessing based on the file name.