Packages.bin Parsing - Githubissues

yretenai commented 6 months ago

Packages.bin has INI metadata for all files, including "virtual" files (i.e. mod card textures and materials, possibly solving #17.) This file used to be uncompressed (and quite large) but for a few years now the format is compressed with ZSTD (similar to #20)

At the moment this format stumps me a bit, here's what I have parsed so far (for packages.bin version 40)

struct STR {
    u32 len;
    char text[len];
};

struct REF {
    STR name;
    u32 unknown;
};

struct ENTITY {
    STR package;
    STR file;
    u8 unknown1;
    if (parent.version >= 36) {
        u16 unknown2;
    } else {
        u32 unknown2;
    }
    STR inherits;
    if (parent.version < 40) {
        u32 unknown3; // 0
    }

    // READ from string_buffer until 0 byte starting at where the last entity ended.
    // there is no string offset value.
    // maybe this is what version 34's buffer1 array is for?
};

struct HEADER {
    u8 hash[16];
    u32 header_size; // 20
    u32 version; // 40
    u32 flags; // 1
    if (version >= 40) {
        u32 unknown; // ??
    }

    if (version >= 36) {
        u32 type_count;
        REF types[type_count];
    }

    u32 package_count; // 0 since version 34
    REF packages[package_count];

    if (version >= 34) {
        u32 buffer1_count;
        u8 buffer1[buffer1_count]; // a lot of 0xFF bytes?
        u32 buffer2_count;
        u8 buffer2[buffer2_count]; // suspected zstd compressed data, no header. starts with 0x100000.
        u32 zdict_count;
        u8 zdict[zdict_count];
    } else {
        u32 string_buffer_count;
        u8 string_buffer[string_buffer_count];
    }

    u32 entity_count;
    ENTITY entities[entity_count];
};

HEADER packages @ 0;

yretenai commented 6 months ago

Example INI data from a previous version:

MATERIAL_ID_MASK=1
EMISSIVE_MASK=0
AO_FROM_DETAILS_BLUE=1
PS:Roughness=0.64779902
PS:ReflectionsLodBias=0
PS:TintColor0={0.016000001,0.018999999,0.028999999,1}
PS:TintColor1={0.52999997,0.49000001,0.44999999,0.5}
PS:TintColor2={0.14,0,0.18000001,0}
PS:TintColor3={0.75,1,0.023,0.5}
PS:BlackParams={1,1,0.5,1}
PS:RedParams={1,0.1,0.75,1}
PS:GreenParams={1,0.1,0.75,0}
PS:BlueParams={1,1,0.5,1}
PS:GrimeTintColor={0.067000002,0.046999998,0.041000001,1}
PS:GrimeRoughness=0.75
PS:LayerNormalStrength={1,3,1,1}
PS:CurvatureStrength=1
PS:EmissiveTintColorLo={0.086999997,0.25999999,0,1}
PS:EmissiveTintColorHi={0.66000003,1,0,1}
PS:EmissiveMapAtten=3
PS:UvScale01={5,5,10,10}
TX:NormalMap=GaraDeluxeBody_n.png
TX:SplatMap=GaraDeluxeBody_t.png
TX:DetailsAoMap=GaraDeluxeBodyPackmap
TX:BlackPackMap=/Lotus/Characters/SharedTileableTextures/Metal/PaintedMetal/PaintedMetal
TX:BlackNormalMap=/Lotus/Characters/SharedTileableTextures/Metal/PaintedMetal/PaintedMetal_n.png
TX:RedPackMap=/Lotus/Characters/Guild/GuildTileableTextures/HexagonRubberPackMap
TX:RedNormalMap=/Lotus/Characters/Guild/GuildTileableTextures/HexagonRubber_n.png
TX:GreenPackMap=/Lotus/Characters/SharedTileableTextures/Glass/Glass
TX:GreenNormalMap=/Lotus/Characters/SharedTileableTextures/Glass/Glass_n.png
TX:BluePackMap=/Lotus/Characters/SharedTileableTextures/Metal/MetalMachined/MetalmachinedPackMap
TX:BlueNormalMap=/Lotus/Characters/SharedTileableTextures/Metal/MetalMachined/MetalMachined_n.png
TX:MaterialMask=GaraDeluxeBodyMaterial_t.png
TX:EmissiveMap=GaraDeluxeBody_e.png

for /Lotus/Characters/Tenno/Glass/GaraDeluxeBody

Puxtril commented 6 months ago

This is great news! Users have been lamenting for a while about missing material parameters, and I believe Gara is one such example.

This and #20 need similar support in LotusLib - parsing the sub-packages inside the Misc package. I can get that implemented when this format is fully understood.

I can start helping with understanding this format when I've finished the UI and adding shader export support.

yretenai commented 6 months ago

The exact specifics of how it's compressed have been a mystery to me. Unfortunately this is also where the INI data now is, as you can see parts of it in the ZDictionary so getting it properly decompressed is important.

sehnryr commented 6 months ago

Nice work you got there! One thing we've found is the 0x100000 value at the start of the second buffer/block seems to be the length of the dictionary of the ZDict block (which would then contain the dictionary itself and the compressed data that uses this dict)

yretenai commented 6 months ago

Good find. The data after that point does seem like regular compressed data. Unfortunately the Dictionary ID isn't present in anywhere in the entire blob, so I wonder if they make a ZSTD frame organically or modified ZSTD to always use the dictionary as it should be part of the frame header

sehnryr commented 6 months ago

Good find. The data after that point does seem like regular compressed data. Unfortunately the Dictionary ID isn't present in anywhere in the entire blob, so I wonder if they make a ZSTD frame organically or modified ZSTD to always use the dictionary as it should be part of the frame header

I don't think you need to know the dict ID, check my (messy) implementation there: https://github.com/sehnryr/wfcache-package-decode/blob/b605418c607e8b443a366953223e3110bd497aa8/src/package_decomp.rs#L22 The only thing that's special is that the dict is magicless so it doesn't contain a Magic_Number (i think that's what that means). I've simply used zstd's library, albeit cleaned up by a wrapper library in rust.

yretenai commented 6 months ago

Ah! That's what it is! It's a ZSTD Stream, with interleaved size bytes. Now only to figure out how it's interleaving the bytes so we can get more than 1 frame.

yretenai commented 6 months ago

Buffer2 is used for either the size of an entire frame with the size, or the offset of a frame. The first frame in current retail is 0x18 bytes, with the size value it's 0x19 which is the first byte of buffer2. Unfortunately, this only lines up for one frame. There's... uncompressed data after the zstd frame...

How over engineered is this?

yretenai commented 6 months ago

OK.

So here's how decompression works:

buffer1 = bit stream to check whether or not a type has inicfg values buffer2 = sizes buffer3 = compressed data

read uint32 from sizes, read that from buffer3. this is zdict data.

~~for each type, check if current bit in buffer 1 is set.~~ ~~if so,~~ read ULEB128 from buffer 2, that is frame size. read that amount from buffer3 (after zdict.) this is your frame.

check if all bytes in the frame are valid ASCII. if so, just copy the block as output. if not, read a ULEB128 from the frame you got from buffer3. this is your decompressed size. decompress using magicless zstd + zdict.

thanks for that implementation @sehnryr, i never thought of reading a single zstd frame.

EDIT: I'm still unsure about buffer1. Config texts are not aligning after a certain point.

sehnryr commented 6 months ago

thanks for that implementation @sehnryr, i never thought of reading a single zstd frame.

Just found there https://github.com/sehnryr/wfcache-package-decode/blob/b605418c607e8b443a366953223e3110bd497aa8/src/main.rs that I made a full implementation for decompressing the whole zstd data. For reading plain text frames it relies on a zstd frame decompressing error. So not perfect.

From what I remember when I worked on that with someone (all credits to him, though I don't know if he wants to be named), He found a relation between the bits in the 1st buffer and the presence of a value for a path in the last buffer and whether that value is compressed or not. I'll check if I can implement it in my parser.

yretenai commented 6 months ago

Buffer1 works as follows (pseudocode)


if(readBit(buffer1) == 1) { // hasText
  size = readULEB(buffer2);
  frame = read(buffer3, size); 
  if(readBit(buffer1) == 1) { // isCompressed
    dsize = readULEB(frame);
    config = decompress(frame);
  } else {
    config = frame;
  }
} else {
  config = NULL;
}

sehnryr commented 6 months ago

I've finished the implementation of something similar: https://github.com/sehnryr/wfcache-package-decode/blob/3da89ac8d086a80203a06be53a374f31bc3fb3e6/src/main.rs#L32

I don't need the sizes from the 2nd buffer (can you confirm this buffer contains the sizes of the frames or at least the offsets?) as the reader increments the cursor as it reads the zstd buffer (using the zstd lib in rust at least, I don't know about C++ or other languages).

yretenai commented 6 months ago

I don't need the sizes from the 2nd buffer (can you confirm this buffer contains the sizes of the frames or at least the offsets?)

I am currently reading size from this buffer.

My test implementation is:

var comFlagsBuffer = buffer.Part(buffer.Read<int>());
var comSizeBuffer = buffer.Part(buffer.Read<int>());
var comZBuffer = buffer.Slice(buffer.Read<int>());

// ...

if (comFlagsBuffer.ReadBits(1) == 1) { // hasText
    var size = (int) comSizeBuffer.ReadULEB(32);
    var frameData = zbuffer.Slice(size); // this advances a cursor by + size as well

    if (comFlagsBuffer.ReadBits(1) == 1) { // isCompressed
        var frame = new CursoredMemoryMarshal(frameData);
        var dsize = (int) frame.ReadULEB(32);

        var buf = ArrayPool<byte>.Shared.Rent(dsize);
        decompressor.Unwrap(frame.Span, buf.AsSpan(0, dsize), false);
        var str = Encoding.ASCII.GetString(buf, 0, dsize);
        ArrayPool<byte>.Shared.Return(buf);
        return str;
    }

    return Encoding.ASCII.GetString(frameData);
}

and it decodes all parts.

Basically,

entry 1 is offset 0 size 0x19. entry 2 is immediately after it at offset 0x19, size 0x13. you have to keep track of offset manually using a cursor.

yretenai commented 6 months ago

https://github.com/yretenai/Lotus/blob/500c5d615563467a87bd002df70b789e944c3240/Lotus.Types/EE/Packages.cs

I have published my code for this.

Puxtril commented 4 months ago

Thanks to both of you for looking into this. I've added this functionality to LotusLib and merged here with https://github.com/Puxtril/Warframe-Exporter/commit/6bb812e8bc83c196327d4ec31f892be176ba2a82

Puxtril / Warframe-Exporter

Packages.bin Parsing #24