8051Enthusiast / delsum

A reverse engineer's checksum toolbox
MIT License
138 stars 6 forks source link

algorithm question #9

Open bucanero opened 8 months ago

bucanero commented 8 months ago

are there any plans to support other algorithms, for example, some like the SDBM hash? http://www.cse.yorku.ca/~oz/hash.html#sdbm

u32 sdbm_hash(const u8* data, u32 len, u32 init)
{
    u32 crc = init;

    while (len--)
        crc = (crc * 0x1003f) + *data++;

    return (crc);
}
8051Enthusiast commented 8 months ago

yeah that's possible, it should also be possible to use the generalized version where 0x1003f is made into a parameter (which would also include djb2), while still being able to recover it efficiently with delsum reverse. i can't say i'll get to it soon though.

are there other algorithms you had in mind?

bucanero commented 8 months ago

for sure, a generic alternative that can cover other variations (sdbm, djb2) would be great.

Regarding other algorithms, I usually dig around save-game checksums for save-game editing, and there are a bunch of simple checksums that could be added too.

There's the "add" family, generally referenced as ADD, WADD, DWADD (8-bit, 16-bit, 32-bit), these are super simple, just a sum of bytes, like:

Add:

    while (len--)
        add += (uint8_t) *data++;

WAdd:

    len = len/2;
    while (len--) {
        wadd += (uint16_t) *data;
        data += 2;
    }

DWAdd:

    len = len/4;
    while (len--) {
        dwadd += (uint32_t) *data;
        data += 4;
    }

Note: some variations use signed int8, int16, int32 instead of unsigned, or read values as big-endian, and some other variations are all substractions instead of sums, like while (len--) { sub -= (uint8_t) *data++; }.

I understand if you don't have time right now, but it would be great to have such nice reversing tool when working with save-game data. 😄

8051Enthusiast commented 8 months ago

some good news about Add, WAdd and DWAdd: i think you should be able to get them already as modsum width=8, modsum width=16 wordsize=16 and modsum width=32 wordsize=32, and adding in_endian=big would make them use big-endian values (not setting the module parameter lets it default to 0, which is the same as no modulus at all).

haven't thought about the sign stuff, so i will definitely implement that when i get around to that.

bucanero commented 8 months ago

thanks for the feedback!