bladecoding / BoIRResourceDecryption

GNU General Public License v3.0
25 stars 10 forks source link

restore original file names #5

Open flying-sheep opened 10 years ago

flying-sheep commented 10 years ago

when calling

strings ".local/share/Steam/SteamApps/common/The Binding of Isaac Rebirth/isaac.x64" | grep resources

i see many lines like resources/gfx/Effects/Effect_Xray_Cathedral.png. i think those are used in some call like loadResource(const char * path) which maps them to offsets in the resource files.

there must be some table in the resource file or binary that maps the file names to records in the archive which we could use to assign names to the records.

naelstrof commented 10 years ago

I imagine getting the filepath is related to the first (or last) 32 bits of each file record, but reading them in any way appears to just give garbage data.

I think a good first step is to attempt to "decrypt" the whole file using one of the record's keys (or perhaps there's a key in the file header somewhere?) and write it to disk, and see what kind of strings we can get out of it.

bladecoding commented 10 years ago

Decompiling the file path hash function right now.

naelstrof commented 10 years ago

Try to comment on the code you make! I think this is awesome to learn about.

bladecoding commented 10 years ago

Here is the code. Going to maybe try hashing all the strings in isaac's binary and see what comes up.

/* Original Asm (Base address 0x00FA0000)
0111B8B0             /$  56              PUSH ESI
0111B8B1             |.  8BF0            MOV ESI, EAX
0111B8B3             |.  8A0E            MOV CL, [ESI]
0111B8B5             |.  B8 05150000     MOV EAX, 1505
0111B8BA             |.  84C9            TEST CL, CL
0111B8BC             |.  74 27           JE SHORT 0111B8E5
0111B8BE             |.  8BFF            MOV EDI, EDI
0111B8C0             |>  8D51 BF         /LEA EDX, [ECX-41]
0111B8C3             |.  46              |INC ESI
0111B8C4             |.  80FA 19         |CMP DL, 19
0111B8C7             |.  77 03           |JA SHORT 0111B8CC
0111B8C9             |.  80C1 20         |ADD CL, 20
0111B8CC             |>  80F9 5C         |CMP CL, 5C
0111B8CF             |.  75 02           |JNZ SHORT 0111B8D3
0111B8D1             |.  B1 2F           |MOV CL, 2F
0111B8D3             |>  8BD0            |MOV EDX, EAX
0111B8D5             |.  C1E2 05         |SHL EDX, 5
0111B8D8             |.  03D0            |ADD EDX, EAX
0111B8DA             |.  0FB6C1          |MOVZX EAX, CL
0111B8DD             |.  8A0E            |MOV CL, [ESI]
0111B8DF             |.  03C2            |ADD EAX, EDX
0111B8E1             |.  84C9            |TEST CL, CL
0111B8E3             |.^ 75 DB           \JNZ SHORT 0111B8C0
0111B8E5             |>  5E              POP ESI                                         ;  00AE74F8
0111B8E6             \.  C3              RETN
*/

//Isaac &s the result with 0x7FF and uses that number for an array lookup to get the file's data.
static uint Hash1(string str)
{
    uint ret = 0x1505;
    for(int i = 0; i < str.Length; i++)
    {
        byte c = (byte)str[i];
        if ((byte)(c - 0x41) <= 0x19)
            c += 0x20;
        if (c == 0x5C)
            c = 0x2F;
        ret = (uint)(((ret << 5) + ret) + c);
    }
    return ret;
}
//This is the second checksum.
//This is used to make sure the first hash pointed to the right place.
//If not isaac will search all records for the 2 correct hashes.
static uint Hash2(string str)
{
    uint ret = 0x5BB2220E;
    for(int i = 0; i < str.Length; i++)
    {
        byte c = (byte)str[i];
        if ((byte)(c - 0x41) <= 0x19)
            c += 0x20;
        if (c == 0x5C)
            c = 0x2F;
        ret = (c ^ ret) * 0x1000193;
    }
    return ret;
}
bladecoding commented 10 years ago

Quick test and it works quite well. Of course many of the graphics didn't get renamed. Going to have to do something about getting names from xml files. I think I may change this to dump by hex into of an incrementing number. Then create a second step that gathers strings from xml files and the binary and tries to repair names.

flying-sheep commented 10 years ago

dump by hex into of an incrementing number

what do you mean? why not simply load all files of config.a into memory, then

  1. look if it’s XML. some are plain text (seeds, fortunes)
  2. if it’s XML, look at the root node. use a manually created table to map root node name to an attr name and xpaths to reconstruct resource paths.

    e.g. <players /> has three root attributes: root, portraitroot, and bigportraitroot. reconstructing the paths would be made from something like this:

    {
     "players": {
       "root": [ "player/@name", "player/hair/@gfx" ],
       "portraitroot": [ "player/@portrait" ],
       "bigportraitroot": [ "player/@bigportrait" ]
     },
     ...
    }

    we’d consume that, and use it on that XML file where it works on (<players> root node) by reading the attribute players/@root, and concatenating that prefix with each of players/player/@name and players/player/hair/@gfx.

as soon as we have created all those file names, we read all .a files, not only config.a and use the file name hashes to name them with the file names we constructed.

flying-sheep commented 10 years ago

so, here it is, but some things to note:

  1. <pocketitems> has pocketitems/card/hud named like 00_TheFool: those are possibly also file names…
  2. <preloads> has no kind of root so i used "" as the key. the file names are specified in full as preloads/preload/png/@path
  3. i guessed which root for what files in stages
  4. <bosses> and <nightmares> each have an attribute that is itself a file path.
  5. the XML file containing <fxLayers> has two “root” nodes, the second of which has no resource root. i used "fxLayers.gfxroot" to signify it uses another node’s path root instead of an own
  6. if one xpath has zero hits in any of the files, i made a typo somewhere. we should detect that and fail so that we can fix the specfile:
{
    "achievements": {
        "gfxroot": [ "achievement/@gfx" ]
    },
    "babies": {
        "root": [ "baby/@skin" ]
    },
    "backdrops": {
        "gfxroot": [ "backdrop/@gfx" ]
    },
    "bosses": {
        "root": [ "boss/@portrait", "@anm2" ]
    },
    "costumes": {
        "anm2root": [ "costume/@anm2path" ]
    },
    "cutscenes": {
        "root": [ "cutscene/anm2part/@anm2", "cutscene/videopart/@file" ]
    },
    "fxLayers": {
        "gfxroot": [ "fx/@path", "fx/gfx/@path" ]
    },
    "fxRays": {
        "fxLayers.gfxroot": [ "rayGroup/fxRay/@path" ]
    },
    "giantbook": {
        "anm2root": [ "entry/@anm2", "entry/@gfx" ]
    },
    "items": {
        "gfxroot": [ "passive/@gfx", "active/@gfx", "familiar/@gfx", "trinket/@gfx" ]
    },
    "music": {
        "root": [ "track/@intro", "track/@path", "track/@layerintro", "track/@layer" ]
    },
    "nightmares": {
        "root": [ "nightmare/@anm2", "@progressAnm2" ]
    },
    "players": {
        "root": [ "player/@name", "player/hair/@gfx" ],
        "portraitroot": [ "player/@portrait" ],
        "bigportraitroot": [ "player/@bigportrait" ]
    },
    "preloads": {
        "": [ "preload/png/@path" ]
    },
    "sounds": {
        "root": [ "sound/sample/@path" ]
    },
    "stages": {
        "root": [ "stage/@path" ],
        "bossgfxroot": [ "stage/@playerspot", "stage/@bosspot" ]
    },
    "entities": {
        "anm2root": [ "entity/@anm2path" ]
    }
}
flying-sheep commented 10 years ago

i have basically implemented it, but i either guessed wrong which unused int is the hash, used the wrong hash function or did something else wrong.

will look into it tomorrow, it’s 3am here.

maybe someone of you wants to have a look? flying-sheep/BoIRResourceDecryption@a7fbb691dffbf8619390aca9628bb9457ffba2ab

i didn’t yet address the special cases i mentioned in the last comment.

bladecoding commented 10 years ago

With that and referenced strings I got 567 filenames decoded out of 2,748 total. I tried using the second hash(the key integer after the first hash) but that didn't increase the number.

flying-sheep commented 10 years ago

what do you mean with “referenced strings”?

flying-sheep commented 10 years ago

new commit makes things prettier, faster, and finds more (i think) flying-sheep/BoIRResourceDecryption@75c548c59761dcfa98711b377345800e0f7afb83