Closed GraylinKim closed 11 years ago
Ok, a couple of things are at play here.
First thing I noticed when looking at the file is that the MPQ file header is off by one byte, starting at byte 1025 instead of the usual 1024. The file's user data header still says that the file header starts at byte 1024. I have no idea why it's lying. Compare these two:
$ hexdump -C test.SC2Replay
00000000 4d 50 51 1b 00 02 00 00 00 04 00 00 3c 00 00 00 |MPQ.........<...|
00000010 05 08 00 02 2c 53 74 61 72 43 72 61 66 74 20 49 |....,StarCraft I|
00000020 49 20 72 65 70 6c 61 79 1b 31 31 02 05 0c 00 09 |I replay.11.....|
00000030 02 02 09 02 04 09 00 06 09 02 08 09 86 fd 01 0a |................|
00000040 09 da f0 01 04 09 04 06 09 88 a3 01 00 00 00 00 |................|
00000050 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
00000400 4d 50 51 1a 2c 00 00 00 47 45 00 00 01 00 03 00 |MPQ.,...GE......|
00000410 a7 43 00 00 a7 44 00 00 10 00 00 00 0a 00 00 00 |.C...D..........|
00000420 00 00 00 00 00 00 00 00 00 00 00 00 05 1c 00 04 |................|
00000430 01 00 04 05 12 00 02 10 48 45 49 44 45 47 45 52 |........HEIDEGER|
00000440 02 05 08 00 09 04 02 07 00 00 53 32 04 09 02 08 |..........S2....|
00000450 09 b2 c6 16 04 02 0c 54 65 72 72 61 6e 06 05 08 |.......Terran...|
00000460 00 09 fe 03 02 09 e8 02 04 09 28 06 09 3c 08 09 |..........(..<..|
00000470 04 0a 09 02 0c 09 c8 01 0e 09 00 10 09 04 05 12 |................|
00000480 00 02 08 61 72 6b 78 02 05 08 00 09 04 02 07 00 |...arkx.........|
00000490 00 53 32 04 09 02 08 09 86 c4 56 04 02 0e 50 72 |.S2.......V...Pr|
000004a0 6f 74 6f 73 73 06 05 08 00 09 fe 03 02 09 00 04 |otoss...........|
000004b0 09 84 01 06 09 fe 03 08 09 00 0a 09 00 0c 09 00 |................|
000004c0 0e 09 00 10 09 02 02 02 1a 53 63 72 61 70 20 53 |.........Scrap S|
000004d0 74 61 74 69 6f 6e 04 02 00 06 05 02 00 02 16 4d |tation.........M|
000004e0 69 6e 69 6d 61 70 2e 74 67 61 08 06 01 0a 09 c2 |inimap.tga......|
$ hexdump -C new_format.SC2Replay
00000000 4d 50 51 1b 00 02 00 00 00 04 00 00 3c 00 00 00 |MPQ.........<...|
00000010 05 08 00 02 2c 53 74 61 72 43 72 61 66 74 20 49 |....,StarCraft I|
00000020 49 20 72 65 70 6c 61 79 1b 31 31 02 05 0c 00 09 |I replay.11.....|
00000030 02 02 09 04 04 09 00 06 09 08 08 09 e0 85 03 0d |................|
00000040 0a 09 e0 85 03 04 09 04 06 09 92 8b 02 00 00 00 |................|
00000050 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
00000400 00 4d 50 51 1a d0 00 00 00 53 fc 00 00 03 00 05 |.MPQ.....S......|
00000410 00 93 fa 00 00 93 fb 00 00 10 00 00 00 0c 00 00 |................|
00000420 00 00 00 00 00 00 00 00 00 00 00 00 00 53 fc 00 |.............S..|
00000430 00 00 00 00 00 88 f9 00 00 00 00 00 00 34 f9 00 |.............4..|
00000440 00 00 00 00 00 00 01 00 00 00 00 00 00 c0 00 00 |................|
00000450 00 00 00 00 00 00 00 00 00 00 00 00 00 44 00 00 |.............D..|
00000460 00 00 00 00 00 fb 00 00 00 00 00 00 00 00 40 00 |..............@.|
00000470 00 46 7c df 51 54 a5 9f d4 82 4d dc c0 8a 12 2f |.F|.QT....M..../|
00000480 99 2b 29 fe 11 70 51 54 82 fc 81 84 4d b4 70 ea |.+)..pQT....M.p.|
00000490 cb 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
000004a0 00 d7 e0 31 1f f2 66 73 3d f6 2f 91 87 e5 bc 6e |...1..fs=./....n|
000004b0 a6 3b 8d 27 20 60 3b 38 62 52 17 63 d3 ea 64 4a |.;.' `;8bR.c..dJ|
000004c0 da f2 0b d1 be 35 44 82 13 a2 60 27 43 bf b2 20 |.....5D...`'C.. |
000004d0 4a 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |J...............|
000004e0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
Notice the one extra "." before the MPQ0x1A signature. I haven't seen MPQs with invalid header offsets before, so mpyq does not contain any workarounds for them.
Taking that into account, the file header says that the MPQ format is now 0x03, meaning that this is a newer MPQ format than what this library is used to. Back in 2010 when I originally developed this library the MPQ version used was 0x01. It seems like the reference I used back then at http://www.zezula.net/en/mpq/mpqformat.html has been updated to cover formats 3 and 4. There are new HET and BET tables present in these archives.
It seems like I need to roll my sleeves a little and start supporting MPQ formats 3 and 4. I was planning to test mpyq with Diablo III files back when it came out but I lost my interest in that game too quickly. Now that Heart of the Swarm is soon here it would make sense to test this library with the latest versions of the game.
In my mind the two issues listed above are separate. I can't see any mention of headers now being off-by-one in the reference, so I'm also tempted to call this a corrupted MPQ file, albeit of a newer version. I'm closing this issue and opening a new one for MPQ formats 3 and 4. If the invalid header offset thing becomes a phenomenon, let's open a new issue for it. In any case mpyq should raise an error and fail gracefully when it can't find the header at the correct offset.
This file has a length of 65892 bytes but
archive.header
has the following offset values:This makes table data an empty string and causes a struct.error when you try to unpack the table entries.
I want to turn around and say that the MPQ file is corrupt and there is nothing we can do but the person submitting the files (see GraylinKim/sc2reader#100) is suggesting that it came directly from his SCII client and that it opens just fine. @dsjoerg says that he has received several more files with a similar issues that he can provide.
Is there some way we can prove this one way or another?