ReFirmLabs / binwalk

Firmware Analysis Tool
MIT License
10.38k stars 1.5k forks source link

unsquashfs doesnt' extract files in squashfs-root #290

Open vmartyanov opened 6 years ago

vmartyanov commented 6 years ago

Tried to use binwalk on the attached file, but had strange results with unsquashfs. It soted files from squashfs into _P-330W_EE_V3.60(AMJ.5)D0_httpupgrade.bin.extracted, not in _P-330W_EE_V3.60(AMJ.5)D0_httpupgrade.bin.extracted/squashfs-root

If I try 'unsquashfs F8010.squashfs' in the _P-330W_EE_V3.60(AMJ.5)D0_httpupgrade.bin.extracted, I got an error message and no files at all. Why I got files from binwalk if manual run of unsquashfs fails? P-330W_EE_V3.60(AMJ.5)D0_httpupgrade.zip

devttys0 commented 6 years ago

I don't know how files from the squashfs image would end up in the wrong directory. Was the squashfs-root directory created at all? Can you provide a directory listing of the _P-330W_EE_V3.60(AMJ.5)D0_httpupgrade.bin.extracted contents? When I run binwalk, it creates a squashfs-root directory and places all files there.

As for unsquashfs, it can't handle vendor-specific modifications, which this squashfs image apparently does have. If unsquashfs fails to extract an image, binwalk will attempt to use sasquatch (assuming it is installed on your system), which is a patched version of unsquashfs which handles these non-standard squashfs images better.

For debugging purposes, can you run the following command on your system and post the output (there will be a lot of debug stuff printed to screen):

$ python -O $(which binwalk) -e P-330W_EE_V3.60\(AMJ.5\)D0_httpupgrade.bin
vmartyanov commented 6 years ago

Yes, squashfs-root was created, but extracted files are in _P-330W_EE_V3.60(AMJ.5)D0_httpupgrade.bin.extracted. The listing is attached. I run the debugging command, the redirected output is attached too. I'm using binwalk 2.1.1, squashfs-tools 4.3-r2 and I don't have sasquatch. debug.txt listing.txt

devttys0 commented 6 years ago

Ah, here is the issue, from debug.txt:

DEBUG: External extractor command "7z e -y '%e'" completed with return code 2 (success: False)
DEBUG: Running extractor '7z e -y '%e''
DEBUG: subprocess.call(7z e -y '20810.7z', stdout=None, stderr=None)

7-Zip [32] 16.02 : Copyright (c) 1999-2016 Igor Pavlov : 2016-05-21
p7zip Version 16.02 (locale=en_US.utf8,Utf16=on,HugeFiles=on,32 bits,1 CPU Intel(R) Pentium(R) M processor 1.73GHz (6D8),ASM)

Scanning the drive for archives:
1 file, 1816586 bytes (1775 KiB)

Extracting archive: 20810.7z

WARNINGS:
There are data after the end of archive

WARNING:
20810.7z
Can not open the file as [7z] archive
The file is open as [SquashFS] archive

--
Path = 20810.7z
Open WARNING: Can not open the file as [7z] archive
Type = SquashFS
WARNINGS:
There are data after the end of archive
Offset = 882696
Physical Size = 933888
Tail Size = 2
Headers Size = 7348
File System = SquashFS 3.0
Method = LZMA Spec
Cluster Size = 65536
Big-endian = +
Created = 2009-10-27 04:38:59
Characteristics = DUPLICATES_REMOVED

Everything is Ok

The fix is to upgrade to the latest binwalk from github which addresses this, but here is a more detailed explanation:

There is a block of LZMA data that precedes the SquashFS file system at offset 0x20810:

DECIMAL       HEXADECIMAL     DESCRIPTION
--------------------------------------------------------------------------------
32784         0x8010          bzip2 compressed data, block size = 900k
98304         0x18000         CSYS header, big endian, size: 131072
133136        0x20810         LZMA compressed data, properties: 0x5D, dictionary size: 8388608 bytes, uncompressed size: 1732656 bytes
1015824       0xF8010         Squashfs filesystem, big endian, version 3.0, size: 929841 bytes, 182 inodes, blocksize: 65536 bytes, created: 2009-10-27 01:38:59

Because the LZMA file format does not provide any information on how large the compressed data is, binwalk grabs everything from offset 0x20810 to the end of the firmware file and saves it to a file called 20810.7z. This means that the 20810.7z file also contains a copy the SquashFS file system which comes after the LZMA compressed data.

Since it is an LZMA file, binwalk then runs the 7z utility to try to decompress the contents of the 20810.7z file. However, some newer versions of the 7z utility examine trailing data after the actual LZMA compressed data, and will detect and extract SquashFS file systems too, rather than just decompressing the LZMA data. As you've seen, this extraction leaves a mess of files, without preserving directories and file system structure. I'm not sure if this is a shortcoming in 7z, or a side-effect of the SquashFS file system having seemingly been modified by whoever made the firmware image, but in either case it is undesirable.

Binwalk then finds the SquashFS file system, copies it out to a file named F8010.squashfs, and attempts to run the unsquashfs utility to extract the contents of the file system to a directory named squashfs-root. The unsquashfs utility creates the squashfs-root output directory, however, due to the non-standard modifications apparently made to the SquashFS file system, unsquashfs fails to extract any files and bails.

Because unsquashfs failed, binwalk then attempts to run sasquatch against the F8010.squashfs file, but since sasquatch is not installed on your system this obviously fails as well.

The end result is that you have a bunch of extracted files from the F8010.squashfs file, but without any directory structure, and an empty squashfs-root directory.

Newer versions of binwalk no longer use 7z for LZMA decompression, thus avoiding variations in how different versions of 7z handle trailing data in LZMA files. If you install the latest binwalk, as well as the sasquatch utility, everything should work as expected and the file system contents will be extracted properly to the squashfs-root directory. Note: if you want LZMA files to also be properly decompressed with the latest version of binwalk, you will also need the python-lzma Python module (this module is included by default in Python3, so this extra installation is only necessary for Python2).

vmartyanov commented 6 years ago

Wow, 7z supports the SquashFS? Didn't know about it! Thank you for explanation!

commonism commented 6 months ago

Newer versions of binwalk no longer use 7z for LZMA decompression,

As of today - 7z is enabled by default via config. https://github.com/ReFirmLabs/binwalk/blob/cddfede795971045d99422bd7a9676c8803ec5ee/src/binwalk/config/extract.conf#L29-L30

Same problem as reported.

Commenting …

#^lzma compressed data:7z:7z e -y '%e':0,1
#^xz compressed data:xz:7z e -y '%e':0,1

e.g.

sed -i -E 's/^(\^.*:7z .+)$/#\1/' binwalk/src/binwalk/config/extract.conf

and it works as required.