donnm / mtk_fw_tools

Mediatek firmware unpacking/repacking tools
GNU General Public License v3.0
43 stars 15 forks source link

Let's have ALICE_1 handled, don't we? #2

Closed pfalcon closed 6 years ago

pfalcon commented 6 years ago

Uncompressed ALICE.bin and compressed ALICE is here: https://github.com/Seeed-Studio/Arduino_IDE_for_RePhone/tree/master/hardware/tools/mtk/firmware/LinkIt_Device/RePhone/W15.19.p2-uart

Note that MT2501/MT2502 seems to use ALICE_1.

pfalcon commented 6 years ago

Another subject with ALICE_1: https://github.com/mtek-hack-hack/Oplayer_SW1402_MT2502D_smartwatch (but that needs raw ROM split first).

donnm commented 6 years ago

Do we have any documentation on ALICE_1 file format? Does it use a dictionary and mapping table like ALICE_2?

Header appears to be 40 bytes like ALICE_2, with range registers and some addresses. Blocksize is zero in the header of the first example you sent.

Could you attach 1) the statistics.txt file and 2) standard output generated from encoding ALICE_1 using ALICE.exe?

pfalcon commented 6 years ago

Do we have any documentation on ALICE_1 file format?

You're the author of this tool-repo and you're even asking? ;-) Of course we don't!

Could you attach 1) the statistics.txt file and 2) standard output generated from encoding ALICE_1 using ALICE.exe?

I never-ever ran ALICE.exe nor have it. Added to my (long) TODO list.

pfalcon commented 6 years ago

Ok, here we go. ALICE.exe comes from code drop starting with "helio" and ending "10".

ALICE.exe -chip MT6250 -iBin ALICE.bin -oBin out_3 -cBase 0x101d6520 -dBase 0x100A0000 -statistics stat.txt -debugLevel 3
=====================Start====================
HeaderVersionNumber: 1
BinaryInputFilename: ALICE.bin
BinaryOutputFilename: out_3
DictionaryEntryLimit: 0
InstructionBitNumber: 16
BlockSizeInEntry: 64
TotalGroupNumber: 8
CompressedImageBaseAddress: 0x101d6520
DecompressedImageBaseAddress: 0x100a0000
RemappingSourceAddress: 0x90000000
RemappingDestinationAddress: 0x10000000
------------------Dictionary------------------
RangeRegisters: 0, 16, 80, 336, 848, 1872, 5968, 14160
ShortestPath: 24074276
DictionaryEntryCount: 14160
DistinctInstructionCount: 45581
TotalInstructionCount: 1867508
TotalDictionaryCoverageCount: 1719926 92.10%
DictionaryCoverageCount:
      16 199124    11.58%
      32 273717    15.91%
      64 363755    21.15%
     128 484268    28.16%
     256 623261    36.24%
     512 780925    45.40%
    1024 956565    55.62%
    2048 1143521   66.49%
    4096 1340755   77.95%
    8192 1553296   90.31%
-------------------Encoding-------------------
OriginalSize: 3735016
CompressedSize: 3006436
CompressionRatio: 80.49%
PaddingBitNumber: 203688
PaddingBitRatio: 0.85%
MappingTableSize: 116724
DictionarySize: 28320
----------------PostProcessing----------------
CompressedStartAddress:   0x101d6544
MappingTableStartAddress: 0x104b4528
DictionaryStartAddress:   0x104d0d1c
TotalFinalSize: 3151516
TotalReducesSize: 583500
TotalCompressionRatio: 84.38%
---------------------Time---------------------
MakeDictionaryTime: 5 seconds
EncodingTime: 0 seconds
PostProcessingTime: 0 seconds
TotalTime: 5 seconds
======================End=====================

The resulting binary is 4 bytes shorter than ALICE from the repo. Eyeballing radiff2 output (which really sucks for such cases, despite what one could think, based on its output), one can see that ALICE from repo, after 0x24-bytes header, has extra 4 bytes "\0\0\xff\xff". After remove those 4 bytes, there're 100% match between rest of data from the repo and produced.

pfalcon commented 6 years ago

unalice.py however doesn't decompress it back correctly. No CAKE, literally.

pfalcon commented 6 years ago

Ok, so I take repo's ALICE, and patch in 0x40 block size instead of zero. Still no CAKE.

pfalcon commented 6 years ago

With this setup, len(instrdict) == 14160, which is about twice more than your code expects.

Mi81 commented 6 years ago

Thanks for Your job. The decoding is much better now.

donnm commented 6 years ago

Fixed and tested on examples in first two comments. First example decodes nearly perfectly (some extraneous end bytes). Needs more testing.

pfalcon commented 6 years ago

Fixed and tested on examples in first two comments.

Ack, tested on the rephone one on my side with pypy2:

$ time pypy unalice.py rephone-ALICE 
[...]
real    0m6.299s
user    0m6.227s
sys 0m0.052s

First example decodes nearly perfectly (some extraneous end bytes).

To elaborate, those bytes are zeroes.

pfalcon commented 6 years ago

@donnm , when you say:

Fixed and tested on examples in first two comments.

Do you mean that you also ran it on: https://github.com/mtek-hack-hack/Oplayer_SW1402_MT2502D_smartwatch ? If so, I have a dumb question: how do we split raw ROM dump these days? I tried few binary blob tools from our cute non-opensource tools (good timing with Meltdown, lol), and they failed to do anything useful. As I mentioned, I didn't see my own tools I wrote on that in my repos, so I'd hate to scavenge for them and find they don't work, if something better is available.

Let's perhaps move the discussion back to forum for wider coverage: https://www.kosagi.com/forums/viewtopic.php?id=158&p=2