avm_kernel_image: correct handling of combined images from AVM's firmware

PeterPawn commented 3 years ago

I'm a bit unsure, whether this is the correct repository for this issue - while the description states:

Wraps FACT unpack plugins into standalone utility.

it seems at the same time the only repository, which contains code dedicated to unpack the various firmware formats.

Nonetheless I'll try to show/explain here, why your attempts to unpack/analyze the firmware for AVM's model 4020 got failed.

If a device model by AVM uses a "combined image" for the firmware, it consists of a kernel image, immediately followed by a filesystem image using SquashFS format. For version 4 of SquashFS, AVM has changed the official format (which uses only "little endian" byte order anymore) to an own, where some data is still stored with "big endian" byte order, if the platform uses BE storage order.

If the platform of the device is a MIPS processor, the SquashFS image isn't stored as one continuous data stream - it contains a gap at an offset, that will be ~~loaded~~ mapped to physical memory address 0xC00000, where the NMI vectors will be looked up in a "very basic state" of processor initialization. The size of this gap varies with the processor and its architecture. If the loader size in flash memory is 0x20000 (and the kernel/filesystem partition starts after the loader partition), this gap will be found in the (single) file kernel.image at offset 0x00BE0000. (EDIT: The load address doesn't really matter, see my earlier post here: https://www.ip-phone-forum.de/threads/%C3%9Cbersicht-von-fritz-boxen-mit-junk-bytes-im-squashfs-image.286318/)

According to your file avm_kernel_image.py (https://github.com/fkie-cad/fact_extractor/blob/master/fact_extractor/plugins/unpacking/avm_kernel_image/code/avm_kernel_image.py#L26) you're trying to split these images into the kernel part and take the whole rest as filesystem image (that's how find-squashfs works). If the filesystem part doesn't contain the NMI vector gap, everything works as expected - but if the SquashFS image contains this gap, the (later) extraction process for the SquashFS image will fail.

There are two options to handle this case correctly ... either you let remove the NMI vector gap from the extracted filesystem image (see this shell script from Freetz project: https://github.com/Freetz/freetz/blob/master/tools/remove-nmi-vector) or you use an extension to the unsquashfs binary (you're using the proper sources already and the files copied from your Freetz container during installation support these extensions) and unpack the SquashFS data directly from the kernel.image file:

peh@vidar:~> mkdir /tmp/FB4020
peh@vidar:~> cd /tmp/FB4020
peh@vidar:/tmp/FB4020> git clone https://github.com/PeterPawn/yf_bin
Cloning into 'yf_bin'...
remote: Enumerating objects: 99, done.
remote: Counting objects: 100% (99/99), done.
remote: Compressing objects: 100% (78/78), done.
remote: Total 926 (delta 22), reused 90 (delta 19), pack-reused 827
Receiving objects: 100% (926/926), 75.25 MiB | 3.54 MiB/s, done.
Resolving deltas: 100% (180/180), done.
Updating files: 100% (535/535), done.
peh@vidar:/tmp/FB4020> wget -q -O - https://ftp.avm.de/fritzbox/fritzbox-4020/deutschland/fritz.os/FRITZ.Box_4020.de-en-es-it-fr-pl.147.07.01.image | tar -x -O ./var/tmp/kernel.image > avm_kernel_image.image
peh@vidar:/tmp/FB4020> yf_bin/squashfs/unsquashfs4-be -stat -scan avm_kernel_image.image
Found a valid superblock at offset 0x001C5F00 while scanning avm_kernel_image.image.
NMI vector found at 0x00BE0000, size=4096
Found TI checksum (0xD5030BF4) at the end of the image.
Found a valid big endian SQUASHFS 4:0 superblock on avm_kernel_image.image.
Creation or last append time is not available because of modified AVM-format (mkfs_time == bytes_used)
Filesystem size 13360.20 Kbytes (13.05 Mbytes)
Compression xz
Block size 65536
Filesystem is exportable via NFS
Inodes are compressed
Data is compressed
Fragments are compressed
Always-use-fragments option is not specified
Xattrs are not stored
Duplicates are removed
Number of fragments 254
Number of inodes 2691
Number of ids 1
peh@vidar:/tmp/FB4020> sudo yf_bin/squashfs/unsquashfs4-be -scan -no-progress avm_kernel_image.image
Found a valid superblock at offset 0x001C5F00 while scanning avm_kernel_image.image.
NMI vector found at 0x00BE0000, size=4096
Found TI checksum (0xD5030BF4) at the end of the image.
Filesystem on avm_kernel_image.image is xz compressed (4:0)
Parallel unsquashfs: Using 2 processors
2508 inodes (2971 blocks) to write

created 2042 files
created 183 directories
created 454 symlinks
created 12 devices
created 0 fifos
peh@vidar:/tmp/FB4020>

Because the new option -scan does not affect any "pure" SquashFS image, it doesn't matter, whether it's always used to search for the SquashFS superblock - you may unpack "plain" SquashFS images, too, while using this option.

Using the option -scan, the superblock offset is determined first and then the existence of the NMI vector gap is checked. If the NMI vector gap is present, it will be skipped while reading/unpacking files from this image. But this is checked/done only, if the new option was specified while calling the tool.

Even if the Freetz implementation (https://github.com/Freetz/freetz/commit/ba45d885189f3284c0eeb2ff13215bfbda2650c2) differs slightly from my own (https://github.com/PeterPawn/YourFreetz/commit/9f89c498caef84fa8cc7c64730c670a71637b43f), both serve the same result - and you should be able now to unpack the 4020 firmware, too.

And by the way ... this is the same procedure for all FRITZ!Box models, which are using a MIPS architecture and this "single image format" for its software - usually these devices have NOR or SPI flash only, because with NAND flash the firmware structure is a different one.

The 4020 was the only model with this structure in your portfolio - otherwise you would have problems unpacking the firmware for other AVM models, too. Try the 7390 firmware (it's still using SquashFS3 format) or 7360v2 (this is a SquashFS4 image with BE byte order) as other examples of these MIPS-devices with NMI vector gap ... if you want to enhance/verify/test your unpacker.

I'm not providing a patch for your file(s), because the if-then construct in avm_kernel_image.py looks odd to me, too. I can't see, why you want to unpack the contained kernel only, if the present file does not contain a SquashFS image - this makes only sense, if you're calling this function recursively and the splitted kernel image from the first call is unpacked with a second call later. This makes the logic a bit obscurely to me - so I'll better let you rule, which changes are needed.

And looking into squash_fs.py (https://github.com/fkie-cad/fact_extractor/blob/master/fact_extractor/plugins/unpacking/squashFS/code/squash_fs.py), the needed changes seem to be more expansive ... currently there aren't different command line options (per tool) while "probing" the right tool to unpack data.

As long as the oldest SquashFS image to process uses SquashFS3 format (and not an earlier one), the tools for SquashFS4 format will be able to unpack this, too, and another try with unsquashfs3-multi should not change the results anymore, if the v4 tools were unable to unpack a file.

jstucke commented 3 years ago

I'm a bit unsure, whether this is the correct repository for this issue

This is indeed the correct repository: FACT uses the "dockerized" version of the extractor (this was done to make the installation process quicker and easier).

I'm not providing a patch for your file(s), because the if-then construct in avm_kernel_image.py looks odd to me, too. I can't see, why you want to unpack the contained kernel only, if the present file does not contain a SquashFS image - this makes only sense, if you're calling this function recursively and the splitted kernel image from the first call is unpacked with a second call later.

I did not write this and needed to work through the code myself, but it indeed seems to be intended to work recursively (the MIME signature matches the kernel part again which is then unpacked in the next round of recursive unpacking). I agree that this is a bit obscure and could be done more clearly.

I could confirm that the -scan parameter works with some of the images that the extractor cannot unpack at the moment. I will try to integrate it into the extractor in a sensible way.

Thank you for your input!

jstucke commented 3 years ago

I was able to unpack the Images of the 4020 and 7390 successfully in FACT with the changes in #66. Changes to avm_kernel_image.py don't seem to be necessary, because the "SquashFS part" also gets unpacked with squash_fs.py after being split up. (Mind that FACT uses a stable version of the extractor by default and this change only takes effect when a new stable version of the extractor is released, but we wanted to do that soon anyway)

fkie-cad / fact_extractor

avm_kernel_image: correct handling of combined images from AVM's firmware #63