WerWolv / ImHex

🔍 A Hex Editor for Reverse Engineers, Programmers and people who value their retinas when working at 3 AM.
https://imhex.werwolv.net
GNU General Public License v2.0
42.81k stars 1.88k forks source link

[Feature] Custom Encoding String Search #436

Closed madsiur closed 1 week ago

madsiur commented 2 years ago

What feature would you like to see?

I've been using ImHex for romhacking and really like the custom encoding feature. However I'd like to see a string search function for the custom encoding we use. Let's say we use a custom encoding covering all letters, here's a snippet of it:

81=b 8F=o FF=[end]

Searching for the "bob[end]" string would be like searching in hex for 818F81FF.

Thanks again for your awesome work on ImHex!

How will this feature be useful to you and others?

-Custom encoding string search is useful for reverse engineering old games and making fan translations.

Request Type

KillyMXI commented 2 years ago

Would it make sense to write a pattern for that ROM structure? I guess the knowledge about this encoding can be incorporated in the pattern and you will be able to see your strings. (Hmm. Being able to filter pattern data based on value column would be handy then.) Or it might be too much to reverse engineer compared to the goal?


I think what's currently missing is a way to make a custom encoding/decoding in the data processor. There are only a couple very specific decoders at the moment.

I guess a simple byte-to-byte mapping block (or a tool) won't be difficult but I suspect in reality more and more complex encodings will keep coming. Thinking about universal implementation though, not sure what would be the best way to approach it. This is actually where embedded scripting can come in handy...

There were some discussion in Discord about ideas that would allow to use patterns for complex data processing... (Another case with encoded data that requires a layer of indirection/preprocessing.)

And then there needs to be more ways to do something with the produced data in the data processor. I'm not sure yet what "Write" block does - seems unsafe. Display block for short buffers would be handy. A block that can output to a file with given name would be handy (with a way to quick open that file).

A wild idea - A "lens" view parallel and in sync with the hex editor view that will contain the result of the encoding/decoding procedure. Albeit it seems to be difficult to implement, since encoded/decoded data size don't have to match.

madsiur commented 2 years ago

I have not looked at the pattern feature yet but if we can look for strings (or see a list of them) that way then it would probably accomplish the goal. Right now I know where most of the strings are in my ROM so I can just go to those offsets with a custom encoding loaded and see them. There are scenarios where you have some string figured out and a custom enconding reverse engineered but there a others non-located string in the ROM you could find by seaching for text you see in-game.

A hex editor that a lot of translators use is called Windhex, it has custom table support and the feature I first described but it has some downsides so I was hoping ImHex could be a good replacement for it.

Edit: I could see the feature becoming more complex like you described, for example the custom enconding could have some control code for game item names like 1F=[item],1 which would translate as 1F00 to 1FFF. I'm not sure how searching for that as a string would work.

KillyMXI commented 2 years ago

Windhex - https://www.romhacking.net/utilities/291/ - this one?

I haven't started using patterns myself yet. I think you can hardcode the start address for interesting data to avoid reversing the whole structure. And gradually improve from that if needed. It will show all parsed ranges with value preview in a separate window (can select the ranges in the hex view itself from there).

WerWolv commented 2 years ago

Searching in custom encodings shouldn't really be an issue to implement I think. The reason I held off with this originally was because of japanese and other unicode characters which would require special treatment.

I'm definitely up for trying to get it working now

madsiur commented 2 years ago

Windhex - https://www.romhacking.net/utilities/291/ - this one?

Yeah that's the one. What I meant by "downsides" is that it lacks some basic things like copy-paste blocks with ctrl+c and paste-write/paste-insert hotkeys like HxD has.

Searching in custom encodings shouldn't really be an issue to implement I think. The reason I held off with this originally was because of japanese and other unicode characters which would require special treatment.

I'm definitely up for trying to get it working now

Thanks a lot! :)

github-actions[bot] commented 1 month ago

This issue is marked stale as it has been open for 11 months without activity. Please try the latest ImHex version. (Avaiable here: https://imhex.download/ for release and https://imhex.download/#nightly for development version) If the issue persists on the latest version, please make a comment on this issue again

Without response, this issue will be closed in one month.