CommunityDragon / CDTB

A library containing everything to extract files from client files.
GNU Lesser General Public License v3.0
119 stars 33 forks source link

Support reading subchunked wad entries using the .subchunktoc file #68

Closed Morilli closed 1 year ago

Morilli commented 1 year ago

This also fixes an exception currently occuring on PBE when parsing the Warwick.wad.client file (from manifest https://lol.secure.dyn.riotcdn.net/channels/public/releases/04C8D70DFF45D23B.manifest):

Traceback (most recent call last):
  File "D:\GitHub\CDTB\grep_wads.py", line 34, in <module>
    wad.guess_extensions()
  File "D:\GitHub\CDTB\cdragontoolbox\wad.py", line 214, in guess_extensions
    data = wadfile.read_data(f)
  File "D:\GitHub\CDTB\cdragontoolbox\wad.py", line 97, in read_data
    return zstd_decompress(data)
pyzstd.ZstdError: Unable to decompress zstd data: Unknown frame descriptor.

The failing (wad-)file in particular has hash F5B498022C64E681 (assets/characters/warwick/skins/skin46/particles/winterblessed_warwick_colorblue_01.pie_c_12_23.tex) and it fails because the first subchunk is uncompressed, while the second and third are.

The fact that the subchunktoc file path must be known before being able to effectively handle files is a big issue of this implementation, but I don't see how this can be done better. I also contemplated making an own class for the subchunktoc file entries (u32 compressed size, u32 uncompressed size, u64 xxhash of data), but decided against it for now as it didn't seem necessary.

benoitryder commented 1 year ago

I pushed a branch (with the same name) with proposed changes. Mostly reorganizing some stuff around the TOC and improving code readability. I still have to check that it actually works.

Morilli commented 1 year ago

Yeah I wanted to look at this again as well. One more thing I thought of would be to try and guess possible .subchunktoc paths before trying to load it, perhaps in the load_subchunk_toc function as well.

The existing .subchunktoc files follow the very predictable pattern of being based on the wad's file path in the actual game files, e.g. data/final/champions/ashe.wad.subchunktoc, so it should be possible to consistently find it if it exists in the wad.

benoitryder commented 1 year ago

Why not. If it becomes more complex we could also consider moving the resolve+load part in a separate function only executed when first accessing subchunk_toc. Not hard to do and that would avoid having to read data when the subchunks are actually never read.

benoitryder commented 1 year ago

Commit pushed separately, with few adjustements.