GrammaTech / ddisasm

A fast and accurate disassembler
https://grammatech.github.io/ddisasm/
GNU Affero General Public License v3.0
646 stars 60 forks source link

[BINARY] clang cannot compile those assembly, neither from gcc nor msvc program #34

Open swang206 opened 3 years ago

swang206 commented 3 years ago

cons.asm:25375:13: error: unknown use of instruction mnemonic without a size suffix jmp $L_14001135e ^ cons.asm:25378:13: error: unknown use of instruction mnemonic without a size suffix mov R8,R15 ^ cons.asm:25379:27: error: unexpected token in argument list lea RCX,QWORD PTR [RSP+72] ^ cons.asm:25380:13: error: unknown use of instruction mnemonic without a size suffix mov RDX,RDI ^ cons.asm:25383:13: error: unknown use of instruction mnemonic without a size suffix cmp EAX,-1 ^ cons.asm:25384:16: error: invalid operand for instruction je $L_14001139b ^~~~

kwarrick commented 3 years ago

Hello,

Please provide as much of the following information as possible:

swang206 commented 3 years ago

Hello,

Please provide as much of the following information as possible:

  • Please attach the binary to this issue.
  • What is the version of ddisasm?
  • How can we reproduce? Please paste the command line used to invoke ddisasm and clang.

master.

It does not work for clang

kwarrick commented 3 years ago

You are not able to disassemble clang itself, or you are not able to reassemble a binary with clang?

swang206 commented 3 years ago

reassemble a binary with clang

kwarrick commented 3 years ago

PE or ELF, 32-bit or 64-bit?

swang206 commented 3 years ago

PE or ELF, 32-bit or 64-bit?

Both PE and ELF 64 bit.

I did not try 32-bit.

kwarrick commented 3 years ago

Currently, for Windows binaries only the MASM assembly syntax is supported by ddisasm/gtirb-pprinter. You will have to use a MASM-compatible assembler such as ML64 or UASM.

For example,

$ cd ddisasm/examples/ex1
$ cl ex.c
$ ddisasm --asm out.asm ex.exe
$ ml64 out.asm /link /subsystem:console /entry:__EntryPoint /machine:x64
swang206 commented 3 years ago

Currently, for Windows binaries only the MASM assembly syntax is supported by ddisasm/gtirb-pprinter. You will have to use a MASM-compatible assembler such as ML64 or UASM.

For example,

$ cd ddisasm/examples/ex1
$ cl ex.c
$ ddisasm --asm out.asm ex.exe
$ ml64 out.asm /link /subsystem:console /entry:__EntryPoint /machine:x64

but why Linux executable does not work with clang?

aeflores commented 3 years ago

It is hard to reproduce the problem if we don't have access to the binary. By the look of those error messages, it could be that clang expects AT&T syntax instead of INTEL syntax, which is the default. I would try specifying the AT&T syntax in the gtirb-pprinter and see if that works better, e.g.:

ddisasm example --ir example.gtirb
gtirb-pprinter example.gtirb --syntax att --asm example.asm
clang example.asm -o example_rewritten

Let us know if that helps

kwarrick commented 3 years ago

We use gcc for reassembly. For the few examples I just tried, clang works. I believe there are some subtle differences in the syntax clang supports, but gcc and clang should mostly be compatible. There are changes in the works that will allow gtirb-pprinter to target multiple assemblers, but for now clang support must be reconciled by the user.

swang206 commented 3 years ago

We use gcc for reassembly. For the few examples I just tried, clang works. I believe there are some subtle differences in the syntax clang supports, but gcc and clang should mostly be compatible. There are changes in the works that will allow gtirb-pprinter to target multiple assemblers, but for now clang support must be reconciled by the user.

hi can you help me address this issue? I try to reassemble notepad++.exe and it cannot find symbols https://github.com/swang206/npp-gtirb-fail

The repository contains IR, asm, and original notepad++.exe file.

I tried to use the binary downloaded from the official and it fails. I compile notepad++ by myself again and it still fails for the same reason. the linker cannot find __imp_COMCTL32@17 symbol, even the extern function is shown in the assembly. image

kwarrick commented 2 years ago

You should be able to generate a .LIB file to satisfy the linker with one additional command-line argument:

$ ddisasm --asm npp.asm --generate-import-libs notepad++.exe

This requires that LIB.exe be on the PATH, but that should already true if you can use ML64.exe. You will see that it generates a COMCTL32.lib file in the local directory. The MSVC linker will find these automatically, and you should be able to reassemble correctly.

swang206 commented 2 years ago

You should be able to generate a .LIB file to satisfy the linker with one additional command-line argument:

$ ddisasm --asm npp.asm --generate-import-libs notepad++.exe

This requires that LIB.exe be on the PATH, but that should already true if you can use ML64.exe. You will see that it generates a COMCTL32.lib file in the local directory. The MSVC linker will find these automatically, and you should be able to reassemble correctly.

I do not see the COMCTL.lib file in the local directory after running ddisasm with flag --generate-import-libs

image

image

no comctl32.lib

kwarrick commented 2 years ago

I do not see the COMCTL.lib file in the local directory after running ddisasm with flag --generate-import-libs

Oh! This is an actual bug I can fix. I've just checked and the .LIB files are only generated when you specify the --asm argument. In you screenshot you only use --ir. Well, I'm not sure this is a bug per se, but it is tricky. At the very least, it should be a warning.

Add --asm npp.asm and it will generate the COMCTL32.lib.

swang206 commented 2 years ago

I do not see the COMCTL.lib file in the local directory after running ddisasm with flag --generate-import-libs

Oh! This is an actual bug I can fix. I've just checked and the .LIB files are only generated when you specify the --asm argument. In you screenshot you only use --ir. Well, I'm not sure this is a bug per se, but it is tricky. At the very least, it should be a warning.

Add --asm npp.asm and it will generate the COMCTL32.lib.

is that possible to generate .lib directly on Linux instead of on windows? ddisasm runs extremely slow on windows.

kwarrick commented 2 years ago

We have code in review that will provide alternatives on Linux. Hopefully that will merge soon, but until then you can use LLVM lld-link as an alternative to LIB.exe.

For example, on Ubuntu:

sudo apt install lld

Then create a simple wrapper script for lib.exe:

cat <<EOF > /tmp/lib.exe
#!/bin/bash
LINK="$(llvm-config --bindir)/lld-link"
\$LINK "\$@"
EOF
sudo mv /tmp/lib.exe /usr/local/bin/lib.exe
sudo chmod +x /usr/local/bin/lib.exe

Now, ddisasm --generate-import-libs should work on Linux using lld-link.


ddisasm runs extremely slow on windows.

Make sure to use a RelWithDebInfo build. A debug build with souffle is too slow.

swang206 commented 2 years ago

We have code in review that will provide alternatives on Linux. Hopefully that will merge soon, but until then you can use LLVM lld-link as an alternative to LIB.exe.

For example, on Ubuntu:

sudo apt install lld

Then create a simple wrapper script for lib.exe:

cat <<EOF > /tmp/lib.exe
#!/bin/bash
LINK="$(llvm-config --bindir)/lld-link"
\$LINK "\$@"
EOF
sudo mv /tmp/lib.exe /usr/local/bin/lib.exe
sudo chmod +x /usr/local/bin/lib.exe

Now, ddisasm --generate-import-libs should work on Linux using lld-link.

ddisasm runs extremely slow on windows.

Make sure to use a RelWithDebInfo build. A debug build with souffle is too slow.

great. i have my lld installed

swang206 commented 2 years ago

npp.zip the exe does not run at all

swang206 commented 2 years ago

image I tried notepad3. It does not work either.

swang206 commented 2 years ago

We have code in review that will provide alternatives on Linux. Hopefully that will merge soon, but until then you can use LLVM lld-link as an alternative to LIB.exe.

For example, on Ubuntu:

sudo apt install lld

Then create a simple wrapper script for lib.exe:

cat <<EOF > /tmp/lib.exe
#!/bin/bash
LINK="$(llvm-config --bindir)/lld-link"
\$LINK "\$@"
EOF
sudo mv /tmp/lib.exe /usr/local/bin/lib.exe
sudo chmod +x /usr/local/bin/lib.exe

Now, ddisasm --generate-import-libs should work on Linux using lld-link.

ddisasm runs extremely slow on windows.

Make sure to use a RelWithDebInfo build. A debug build with souffle is too slow.

hello. I tried 4 different windows software, none of them works.

Can you tell me a windows software that works? So I can use that for my work.

swang206 commented 2 years ago

Even helloworld does not work

image

What's wrong here?

kwarrick commented 2 years ago

Assuming helloworld.asm is the output of ddisasm, you will have to use /entry:__EntryPoint. By default, MSVC will statically link the C runtime, which means main is most likely not the entry point of the PE you are disassembling. For this reason, ddisasm actually creates the __EntryPoint label for you.

swang206 commented 2 years ago

/entry:__EntryPoint

what if i am using /MD??

kwarrick commented 2 years ago

You can still use __EntryPoint, or you can insert the following line into helloworld.asmso /entry:main will work:

PUBLIC main
swang206 commented 2 years ago

You can still use __EntryPoint, or you can insert the following line into helloworld.asmso /entry:main will work:

PUBLIC main

how to deal with those syntax errors or conflicts etc??

kwarrick commented 2 years ago

From the screenshot of notepad3.asm? Those are real ddisasm disassembly errors. I will take a look.

swang206 commented 2 years ago

From the screenshot of notepad3.asm? Those are real ddisasm disassembly errors. I will take a look.

Any guideline on how to compile gtirb, targeting windows. Do you use cross-compilation?

swang206 commented 2 years ago

From the screenshot of notepad3.asm? Those are real ddisasm disassembly errors. I will take a look.

hi.

There are some issues with the assembly for ml64.

  1. ml64 does not support int1, int3, ud1 instructions. do not know whether they can be replaced with ud2.
  2. rcl BYTE PTR [RSI+81949] are not legal instructions.

I think generating gnu assembly is still useful even for windows (PE) executable since it does not have so many disassembly issues like microsoft's ones.

kwarrick commented 2 years ago

I think generating gnu assembly is still useful even for windows (PE) executable since it does not have so many disassembly issues like microsoft's ones.

I agree. It is on my list.

how does ddisasm work with windows executable with resource file?

We actually just merged changes last week to improve this. In short, you can now pass the --generate-resources argument to ddisasm to create a .RES file in your local directory, which can be passed to the linker.

See examples/ex_rsrc for a simple test:

$ ddisasm --generate-resources --asm out.asm ex.exe
$ ml64 out.asm /link /subsystem:console /entry:__EntryPoint /machine:x64 ex.res
swang206 commented 2 years ago

I think generating gnu assembly is still useful even for windows (PE) executable since it does not have so many disassembly issues like microsoft's ones.

I agree. It is on my list.

how does ddisasm work with windows executable with resource file?

We actually just merged changes last week to improve this. In short, you can now pass the --generate-resources argument to ddisasm to create a .RES file in your local directory, which can be passed to the linker.

See examples/ex_rsrc for a simple test:

$ ddisasm --generate-resources --asm out.asm ex.exe
$ ml64 out.asm /link /subsystem:console /entry:__EntryPoint /machine:x64 ex.res

Hi kwarrick. I try to disassemble 7zip. It works but the windows does not pop up at all. Why? 7zip_2.zip

Here is the IR file, assembly, and executables.

What I found is that a lot of windows GUI executables just flash and exit after disassembly. Can you have a look at it??

kwarrick commented 2 years ago

From the screenshot of notepad3.asm? Those are real ddisasm disassembly errors. I will take a look.

After looking at the assembly output from Notepad3.exe, I have determined that is this a binary that has had the .rdata section merged with the .text section.

I am not entirely sure of the motivation, but it appears that a lot PE32 binaries (32-bit) have merged data and code sections.The MSVC compiler provides an option to do this:

$ cl ex.c /link /merge:.rdata=.text

When you look at the beginning of the .text section you will see a huge list of addresses, followed by string constants, PE data directory structures, and lots of other data regions.

As ddisasm was originally developed against ELF binaries, the only data-in-code analysis required thus far has been for jump tables within the code section. To correctly disassemble binaries with merged data sections, I have been working on a branch that introduces more complex data analysis logics. I will update this issue when we merge that work.