armijnhemel / binaryanalysis-ng

Binary Analysis Next Generation (BANG)
GNU Affero General Public License v3.0
471 stars 66 forks source link

Failed to parse linux ubuntu 20.04 kernels #356

Closed chimelab closed 1 year ago

chimelab commented 1 year ago

Kernel files can be download from ubuntu website, e.g. Ubuntu 20.04 Kernel Extract the downloaded deb to get the kernel file, named as vmlinuz-5.4.0-???-generic, and parse it with Bang. Below errors are prompted:

[DEBUG/Process-2] scan_signatures[_ex000000]: trying parse at /vm_share/simple-test/output/vmlinuz-5.4.0-162-generic:0 with <class 'bang.parsers.filesystem.mbr_partition_table.UnpackParser.MbrPartitionTableUnpackParser'> [1694499231068847526]
[DEBUG/Process-2] scan_signatures[_ex000000]: failed parse at /vm_share/simple-test/output/vmlinuz-5.4.0-162-generic:0 with <class 'bang.parsers.filesystem.mbr_partition_table.UnpackParser.MbrPartitionTableUnpackParser'> [1694499231069175882]
[DEBUG/Process-2] scan_signatures[_ex000000]: <class 'bang.parsers.filesystem.mbr_partition_table.UnpackParser.MbrPartitionTableUnpackParser'> parser exception: ('partition bigger than file',)
[DEBUG/Process-2] scan_signatures[_ex000000]: wait at 0, found parsers at 144: [<class 'bang.parsers.image.ico.UnpackParser.IcoUnpackParser'>]
[DEBUG/Process-2] scan_signatures[_ex000000]: trying parse at /vm_share/simple-test/output/vmlinuz-5.4.0-162-generic:144 with <class 'bang.parsers.image.ico.UnpackParser.IcoUnpackParser'> [1694499231069282642]
[DEBUG/Process-2] scan_signatures[_ex000000]: failed parse at /vm_share/simple-test/output/vmlinuz-5.4.0-162-generic:144 with <class 'bang.parsers.image.ico.UnpackParser.IcoUnpackParser'> [1694499231069352573]
[DEBUG/Process-2] scan_signatures[_ex000000]: <class 'bang.parsers.image.ico.UnpackParser.IcoUnpackParser'> parser exception: ('/seq/1: at pos 6: validation failed: not in range, min 1, but got 0',)
[DEBUG/Process-2] scan_signatures[_ex000000]: wait at 0, found parsers at 145: [<class 'bang.parsers.font.truetype_font.UnpackParser.TruetypeFontUnpackParser'>]
[DEBUG/Process-2] scan_signatures[_ex000000]: trying parse at /vm_share/simple-test/output/vmlinuz-5.4.0-162-generic:145 with <class 'bang.parsers.font.truetype_font.UnpackParser.TruetypeFontUnpackParser'> [1694499231069443674]
[DEBUG/Process-2] scan_signatures[_ex000000]: failed parse at /vm_share/simple-test/output/vmlinuz-5.4.0-162-generic:145 with <class 'bang.parsers.font.truetype_font.UnpackParser.TruetypeFontUnpackParser'> [1694499231069591932]
[DEBUG/Process-2] scan_signatures[_ex000000]: <class 'bang.parsers.font.truetype_font.UnpackParser.TruetypeFontUnpackParser'> parser exception: ('ascii', b'\x14\x80\x99\xd0', 1, 2, 'ordinal not in range(128)')
[DEBUG/Process-2] scan_signatures[_ex000000]: wait at 0, found parsers at 166: [<class 'bang.parsers.filesystem.mbr_partition_table.UnpackParser.MbrPartitionTableUnpackParser'>]
[DEBUG/Process-2] scan_signatures[_ex000000]: trying parse at /vm_share/simple-test/output/vmlinuz-5.4.0-162-generic:166 with <class 'bang.parsers.filesystem.mbr_partition_table.UnpackParser.MbrPartitionTableUnpackParser'> [1694499231069679887]
[DEBUG/Process-2] scan_signatures[_ex000000]: failed parse at /vm_share/simple-test/output/vmlinuz-5.4.0-162-generic:166 with <class 'bang.parsers.filesystem.mbr_partition_table.UnpackParser.MbrPartitionTableUnpackParser'> [1694499231069810662]
armijnhemel commented 1 year ago

These are not errors, but expected behaviour.

BANG uses (amongst other things) signatures to try and parse files. I can see that in this case it has tried to parse an ICO file and a TTF. These both have very generic signatures, so the parser will very quickly see that it cannot be a a valid ICO or TTF and the parser will fail and BANG will move on to the next signature.

chimelab commented 1 year ago

Thank you for commemts. However it turned out the kernel file was not extracted at all. I will do some further investigation to understand what's the cause. Again, thanks Dr.

armijnhemel commented 1 year ago

Thank you for commemts. However it turned out the kernel file was not extracted at all. I will do some further investigation to understand what's the cause. Again, thanks Dr.

Not Dr ;-)

Could you please tell me which file exactly you wanted to scan so I can take a look? Depending on the architecture there are different binary formats in use (for ARM it is different than for ARM64, different from x86-64, etcetera). The problem that I am currently facing is that the documentation isn't great for figuring out which parts of a file I need to look at.

chimelab commented 1 year ago

It's amd64, not arm. URL of the kernel file is as below: http://archive.ubuntu.com/ubuntu/pool/main/l/linux-signed/linux-image-5.4.0-149-generic_5.4.0-149.166_amd64.deb Actually I had tried several other kernels of 5.4.0, which were downloaded via apt-cache. Bang can't handle too.

Steps: 1, extract the deb with 7zip first; 2, use bang to scan the kernel file linux-image-5.4.0-149-generic_5.4.0-149.166_amd64.deb\data.tar.\boot\vmlinuz-5.4.0-149-generic

I'm using vmware/ubuntu 22.04. Below is the bang command: python3 -m bang.cli scan -u bang-vmlinuz vmlinuz-5.4.0-149-generic

armijnhemel commented 1 year ago

It's amd64, not arm. URL of the kernel file is as below: http://archive.ubuntu.com/ubuntu/pool/main/l/linux-signed/linux-image-5.4.0-149-generic_5.4.0-149.166_amd64.deb Actually I had tried several other kernels of 5.4.0, which were downloaded via apt-cache. Bang can't handle too.

Steps: 1, extract the deb with 7zip first;

This is odd, as BANG can perfectly fine unpack .deb files (I just verified).

2, use bang to scan the kernel file linux-image-5.4.0-149-generic_5.4.0-149.166_amd64.deb\data.tar.\boot\vmlinuz-5.4.0-149-generic

I'm using vmware/ubuntu 22.04. Below is the bang command: python3 -m bang.cli scan -u bang-vmlinuz vmlinuz-5.4.0-149-generic

OK. I can see that something is going wrong there. It seems that the kernel image is compressed using LZ4 (legacy format) at0x000049b1. Correctly parsing x86 Linux kernel files is something that I am working on (I have done some work in a branch, but it isn't finished yet). I will add this file for verification.

armijnhemel commented 1 year ago

Some more technical details (also as documentation so I don't forget): this particular kernel image contains an ELF file that has been compressed with the LZ4 legacy format. This format does not have a marker that indicates the end of the compressed data, but uses the EOF marker. If there is data following a block but there isn't enough for that block, then it is very hard to distinguish between extra data, or a corrupted data block (also see https://github.com/lz4/lz4/blob/dev/doc/lz4_Frame_format.md#legacy-frame ).

In the Linux kernel x86 format ( https://www.kernel.org/doc/Documentation/x86/boot.txt ) I can find an offset and length for a payload, but when writing the payload to a separate file and processing that file it seems that there are 4 bytes too many, so uncompressing fails. I have verified this by decompressing using lz4cat and then compressing again:

$ ll
total 26184
-rwxr--r-- 1 armijn armijn 13405599 Sep 18 19:23 payload
-rw-rw-r-- 1 armijn armijn 13405595 Sep 18 19:27 payload2

The second file decompresses cleanly according to lz4cat, while the first one throws a warning/error talking about "undecodable content":

$ lz4cat payload -vvvvv > bla
*** LZ4 command line interface 64-bits v1.9.4, by Yann Collet ***
_POSIX_C_SOURCE defined: 200809L
_POSIX_VERSION defined: 200809L
PLATFORM_POSIX_VERSION defined: 200809L
Using stdout for output 
Sparse File Support automatically disabled on stdout ; to force-enable it, add --sparse command 
Detected : Legacy format 
Stream followed by undecodable data at position 13405599 
payload              : decoded 49234268 bytes

It seems that this might possibly be somewhat related:

https://github.com/lz4/lz4/issues/956

although it doesn't seem to be LZ4, because Fedora kernels that use zstd also have an extra 4 bytes when unpacking.

As I already know that the payload is coming from a Linux kernel image I could propagate some hints the the LZ4 legacy parser.

armijnhemel commented 1 year ago

https://github.com/armijnhemel/binaryanalysis-ng/tree/kernel_x86_kaitai

armijnhemel commented 1 year ago

https://github.com/armijnhemel/binaryanalysis-ng/tree/kernel_x86_kaitai

This branch has been merged into master