WerWolv / ImHex

🔍 A Hex Editor for Reverse Engineers, Programmers and people who value their retinas when working at 3 AM.
https://imhex.werwolv.net
GNU General Public License v2.0
44k stars 1.92k forks source link

Add expressions to the pattern language #61

Closed steinuil closed 3 years ago

steinuil commented 3 years ago

I'm trying to reverse engineer a file format which has some length-prefixed strings at some point, which are encoded as such:

struct PString {
    u32 length;
    u8 string[length / 12];
};

(I don't know why 12, this format is weird like that :P)

It would be nice to be able to use simple arithmetic expressions in the size of an array rather than just variables so I could express this struct in the pattern language. I've seen there's already #15 open for expressions in variable placements, but that looked a bit more complex and also didn't cover array sizes.

And thank you for making this, I'm finding it very useful!

WerWolv commented 3 years ago

Hi! Yeah arithmetic expressions are going to be one of the next things that need to be added to make the language more complete. Expect it somewhen soon :)

the-wondersmith commented 3 years ago

@WerWolv Hopefully I'm not overstepping, but it may save you a ton of work and instantly expand the ImHex pattern library if you looked at using Kaitai Struct either instead of or in addition to the current pattern language.

Foxboron commented 3 years ago

I'd like to mention GNU Poke as well.

http://jemarch.net/poke.html

WerWolv commented 3 years ago

One of the reasons I started making this hex editor was to build my own lexer, parser, AST and evaluator. I know, it would probably be a lot easier to just use something great that already exists but I'd really like to do this myself, even if it means that the patterns have less features for now. I really enjoy designing my own language

Foxboron commented 3 years ago

Apologies! It's not my intention to claim you should stop writing your own thing :smile: I think it helps a bit to see how other languages solve similar challenges, but if I'm just bringing up things you have looked at I apologize for the distraction :)

hugsy commented 3 years ago

@WerWolv Hopefully I'm not overstepping, but it may save you a ton of work and instantly expand the ImHex pattern library if you looked at using Kaitai Struct either instead of or in addition to the current pattern language.

I fully agree with @the-wondersmith : using Kaitai with ImHex would simply create the perfect hex viewer and would give ImHex instant access to all the formats already supported and a very simple-but-efficient way to add new ones.

@WerWolv Is there really no chance we'll see a Kaitai integration?

WerWolv commented 3 years ago

If you want any chance of me continuing this project then no. I'm doing this because I enjoy it and I wanted to learn how to write a lexer, parser and evaluator. If I just throw it all out and instead use something existing, even if that something will absolutely be better, I lose pretty much the whole reason behind why I'm doing this in the first place.

the-wondersmith commented 3 years ago

@WerWolv Text on a screen doesn't convey verbal tone and social cues the way that spoken conversation does, so let me be clear and explicit -

The only reason any of us have made suggestions is in the spirit of improving a tool that we already think is head and shoulders above other options. I don't think we've yet irritated or insulted you, but in case any of us have please accept our sincere apologies.

I think perhaps a better way to phrase the question might be "are you wholly opposed to the addition of Kaitai support to ImHex, or are would someone else doing so and submitting a PR be equally project-ending as someone insisting that you do so yourself" ?

WerWolv commented 3 years ago

I'm sorry if I answered too harshly, I'm not in the best mood today. I didn't take any of your suggestions as an insult.

To me it just feels like people don't want to give my language a chance because, after three weeks, it doesn't have as many features as something existing that has been worked on for years. It feels very demotivating to read "Why not use X, it is so much better" all the time, not just on GitHub.

I'm opposed in the sense that I don't quite see why we need two things that do exactly the same ultimately. Is there anything that absolutely requires the use of Kaitai? If somebody wants to make a PR to integrate Kaitai in a way that doesn't just make my pattern language useless / replaces it, it will probably get merged.

the-wondersmith commented 3 years ago

@WerWolv I completely understand - some days just aren't the right day. If there's anything I (or any of us, I'm sure) can do to improve it, please let me know.

As for anything requiring the use of Kaitai, the honest answer is no nothing technically requires its use. I've spent several years working with binary files whose structure is intended to be a black-box / walled-garden type thing and therefore require reverse engineering in order to understand. I'm absolutely not the world's greatest coder, but I'm not exactly a slouch either and speaking from that experience it's a real pain in the ass to not only reverse a binary format but also then document it and then also have to write a usable parser for it. Someone put a lot of time, effort, and energy into the underlying... whatever powers Kaitai (black magic maybe?) so that if you write a specification using it you end up with a single file that's not only self-documenting but will also generate a fully functional, fully commented and documented reader / writer in your choice of 11 different programming languages.

The reason that I personally suggested Kaitai is because the learning curve isn't particularly steep and because whoever is writing it seems to have both the financial and coding resources to maintain and expand it. Case in point, I believe that there are not only several open source reverse-enginering-centric projects that rely on it, but I know there are plugins for it for both IDA Pro and Binary Ninja which means there's probably one for Ghidra as well. I didn't realize when I made the suggestion that your purpose in creating ImHex was learning how to write a lexer, parser and evaluator. I figured that you'd also run up against the total lack of really good hex editors for RE work and decided to do something about it. It seemed like a bit much to ask you to not only develop, maintain, and continually expand ImHex only to then pile on top of that demanding that you also carry the burden of doing the same for a whole parser language.

The nice thing, however, about being the project's author is that you and you alone dictate the project's course. Like I said previously, we might have suggestions but at the end of the day we're here to help however you decide you want or need us to. 🙂

WerWolv commented 3 years ago

So, despite everything I've been rewriting the entire parser and AST in the past week. Currently implemented again are variable placement, structs and unions with simple data types (no pointers or arrays yet) and enums. However, one of the advancements I've made now is that we have proper expression parsing now as the original request here asked for :)

WerWolv commented 3 years ago

We're almost there again with the language where I left off before :)

Major missing things right now: Nothing!

Besides those everything works again but this time with a parser and evaluator that is SO MUCH nicer to use, easier to expand upon and has SO MUCH less code duplication. Like seriously, the parser went from 663 lines to 350 lines and the evaluator went from 428 lines to 185. It will get a little bit more once I'm done adding everything back but still nowhere close to where it was before.

WerWolv commented 3 years ago

So, with the latest few commits to the parser_rewrite branch, I finally added back all the features that were there previously with the addition of proper mathematical expressions (like this issue requested originally) and proper rvalues (allowing you to access nested variables like structA.structB.x) :) I'll close this issue when merging the branch into master

blark commented 7 months ago

Hey @WerWolv first off thanks for making ImHex, it's great. I came here to check if you'd considered Kaitai support and I got my answer. I appreciate your perspective.

There are many pre-existing formats in Kaitai format, would doing some sort of translation from that format to ImHex's native pattern language be something that you'd consider? Having the ability to tap into a library of pre-existing format specifications would be useful.

However, I supposed I'm free to go write my own plugin so I should probably shut up and get coding if I am really interested in that eh? Heh.

WerWolv commented 7 months ago

@blark If you'd like to do that, absolutely! The only requirement I really have is that ImHex doesn't contain more than one format internally. Everything should ultimately go through the pattern language.

There is a script on the ImHex-Patterns repo that tries to do exactly what you're asking but it's not done yet in any way. It's python if you want to expand on that.