ThoughtGang / opdis

libopcodes-based disassembler
GNU General Public License v2.0
35 stars 10 forks source link

x86: hlt instruction encoding not detected #4

Closed SimonKagstrom closed 11 years ago

SimonKagstrom commented 11 years ago

The halt instruction is disassembled as "hlt", at least with my version of objdump. I'll attach a trivial patch which detects this correctly.

SimonKagstrom commented 11 years ago
From fd9b884342ed9214c45115bf4b02f6a8d2487934 Mon Sep 17 00:00:00 2001
From: Simon Kagstrom <simon.kagstrom@netinsight.net>
Date: Thu, 27 Dec 2012 14:08:53 +0100
Subject: [PATCH] x86_decoder: Detect hlt instruction

---
Hmm. I thought you could attach files with github. Anyway, here's the trivial patch... I hope it's possible to apply, otherwise I think the intention should be clear.

 opdis/x86_decoder.c |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/opdis/x86_decoder.c b/opdis/x86_decoder.c
index e433aad..d63a2f8 100644
--- a/opdis/x86_decoder.c
+++ b/opdis/x86_decoder.c
@@ -221,6 +221,7 @@ static void decode_intel_mnemonic( opdis_insn_t * out, const char * item ) {

    /* system instructions */
    if (! strncmp( "inv", item, 3 ) || ! strncmp( "halt", item, 4 ) ||
+       ! strncmp( "hlt", item, 3 ) ||
        ! strncmp( "clts", item, 4 ) || ! strncmp( "ltr", item, 3 ) ||
        ! strncmp( "rsm", item, 3 ) || ! strncmp( "wbinvd", item, 6 ) ) {
        out->category = opdis_insn_cat_priv;
-- 
1.7.0.4
mkfs commented 11 years ago

Yup, it's "hlt" in objdump (and in binutils/opcodes/i386-opc.tbl) -- guess I should have double-checked that.

Some day I need to go through the opcode table and fill in all the ones that are missing in opdis. I mostly use a Ruby wrapper for opdis that does instruction metadata generation of this sort (and is more complete than opdis).

SimonKagstrom commented 11 years ago

I'm planning to take a look at writing decoders for MIPS, ARM and PowerPC someday. However, I don't expect to be able to make them as complete as the x86 one, since I mostly care about identifying loads/stores and branches.

mkfs commented 11 years ago

That class of instructions are the most interesting, of course.

ARMis very interesting these days. I'm willing to sink a couple weeks into making this thing-cross-platform, but I'll need a few test-cases.

In regards to decoders: are you limited to C? This project and the original (libdisasm) assume so, but that is only true in embedded and psuedo-realtime applications.

SimonKagstrom commented 11 years ago

Well, I'd like to keep it C (or maybe C++). I'm rewriting this application in C++:

http://code.google.com/p/dissy/

(the new project is here https://github.com/SimonKagstrom/emilpro if you're interested, that one uses opdis). And I'd like to keep it as compiled code this time.

rofl0r commented 11 years ago

i'd welcome a cross platform work, however C++ would be a horrible choice: due to name mangling issues and undocumented ABI you cant use C++ libraries with anything else than C++ (though often not even between different compiler versions). otoh C can be used everywhere without a problem, even from python and other scripting langs, so it is the only sane choice for a library.

rofl0r commented 11 years ago

btw, imo those strcmp orgies are a horribly wasteful way. i'd suggest normalizing the string once (e.g. to lowercase), then look up a hashtable to get an enum result for the opcode and use that for the switch statement

mkfs commented 11 years ago

Yeah, the strcmps were a quick hack that I never replaced. In Ruby, I use a Hash with the mnemonic as the key. It wouldn't be too hard to write a Ruby script that generates C code from this -- I'll add that to my TODO list.

Most of my work is in Ruby these days. I only use C when writing device drivers and doing embedded work. I agree that C makes the best choice for a library (naturally), but C code just takes to long to write.

mkfs commented 11 years ago

It's good to know that people are using this library, though -- that gives me an incentive to fix it. I wrote opdis in 2010 for some RE work, and haven't had a use for it since :)

I have some binary analysis projects lined up for spring that will require opdis. I'll schedule a week's worth of evenings in February for opdis updates. See if I can get ARM rolling.

rofl0r commented 11 years ago

well i'm not currently using opdis, i added it to my watchlist when i searched for a disasm library to use in my planned debugger ( https://github.com/rofl0r/debuglib ). however, the need to have libopcode seems to be a major disadvantage.

btw regarding automated optimized C code for string comparisons i've written some pretty sweet thing: it's implemented as a mix of C macros and a small program that generates code from some inline comments: example: https://github.com/rofl0r/stringswitch/blob/master/example/test.c outcome: https://github.com/rofl0r/stringswitch/blob/master/example/stringswitch_impl_argv0.c

according to my benchmarks, it's a good bit faster than gperf.

mkfs commented 11 years ago

Removing the libopcodes dependency is out of the question -- I simply don't have the time to maintain an x86 disassembler.

I could abstract the metadata generator out into its own library -- basically, return instruction and operand metadata based on the mnemonics. I've done that already with the non-public version of Ruby code. That would allow the metadata to be generated from any disassembler (bea, distorm, etc).

The inear and cflow disassembly routines are trivial enough that they aren't worth abstracting into a standalone library.

SimonKagstrom commented 11 years ago

I'd definately vote against removing the libopcodes dependency: There are already hundreds of more (and mostly less!) complete x86-only disassemblers. What sets libopdis apart is that it supports anything which libbfd does (i.e., basically everything) and has a well-working API, in contrast to libbfd.

Anyway, I wasn't suggesting introducing C++ into libopdis, it's just that I don't want a dependency on non-compiled languages (and in that sense, C and C++ are the same for me).