CybercentreCanada / assemblyline

AssemblyLine 4: File triage and malware analysis
https://cybercentrecanada.github.io/assemblyline4_docs/
MIT License
233 stars 14 forks source link

Extract more files using hachoir-subfile (or other tools) #35

Closed scottpas closed 1 year ago

scottpas commented 2 years ago

Example hash: c7dd490adb297b7f529950778b5a426e8068ea2df58be5d8fd49fe55b5331e28

hachoir-subfile output:

# hachoir-subfile c7dd490adb297b7f529950778b5a426e8068ea2df58be5d8fd49fe55b5331e28.doc
[+] Start search on 1563136 bytes (1.5 MB)

[+] File at 0 size=1563136 (1.5 MB): Microsoft Office document
[+] File at 512: Microsoft Office Word document
[+] File at 8775 size=1371772 (1.3 MB): PNG picture: 2480x3508x24

When running this same file through AL, the PNG file is not extracted. An extracted ole object contains the png file, but the image itself does not appear anywhere in the AL output.

Perhaps this could be a deep scan feature, since it may add a lot of artifacts that people may not care much about.

scottpas commented 2 years ago

Here's another example file:

12361b94bae2da00f0215d8a22674066dd4198d3c5795c3dfdad605b3a15ffb5 (on MalwareBazaar)

It's a MSI file, which contains a CAB with an embedded malicious DLL. The CAB and DLL aren't extracted, even with Deep Scan enabled. I also enabled continue_after_extract and extract_executable_sections.

hachoir-subfile /tmp/12361b94bae2da00f0215d8a22674066dd4198d3c5795c3dfdad605b3a15ffb5.msi
[+] Start search on 861184 bytes (841.0 KB)

[+] File at 0 size=861184 (841.0 KB): Microsoft Office document
[+] File at 61888 size=318 (318 bytes): Microsoft Windows icon: 16x16x0
[+] File at 62208 size=318 (318 bytes): Microsoft Windows icon: 16x16x0
[+] File at 77312 size=105056 (102.6 KB): Microsoft Bitmap version 3
[+] File at 188928 size=671943 (656.2 KB): Microsoft Cabinet archive
cccs-rs commented 2 years ago

Here's another example file:

12361b94bae2da00f0215d8a22674066dd4198d3c5795c3dfdad605b3a15ffb5 (on MalwareBazaar)

It's a MSI file, which contains a CAB with an embedded malicious DLL. The CAB and DLL aren't extracted, even with Deep Scan enabled. I also enabled continue_after_extract and extract_executable_sections.

Oletools was able to extract the CAB and then Extract was able to extract the DLL.

image

Example hash: c7dd490adb297b7f529950778b5a426e8068ea2df58be5d8fd49fe55b5331e28

hachoir-subfile output:

... When running this same file through AL, the PNG file is not extracted. An extracted ole object contains the png file, but the image itself does not appear anywhere in the AL output.

DocPreview was able to create a render which coincidentally matches the PNG extracted using hachoir-subfile (but AL wasn't able to extract the PNG in question): image

While I do see value in enhancing the service with a tool that's able to extract subfiles from the original file's stream, I'm not sure if this lib does a stellar job.

For instance, I was hoping it would extract the image from this simple Word doc but it seems to interpret the entire file as a ZIP: d1720ff15ba5a134415875a35cbe203777cf389d87f6a79aacc801ea543cddae

from hachoir.subfile.search import SearchSubfile
from hachoir.stream import FileInputStream
stream = FileInputStream("d1720ff15ba5a134415875a35cbe203777cf389d87f6a79aacc801ea543cddae")
subfile = SearchSubfile(stream, 0, None)
subfile.loadParsers(None,None)
subfile.setOutput('pop')
subfile.main()
[+] Start search on 1210982 bytes (1.2 MB)

[+] File at 0 size=1210982 (1.2 MB): ZIP archive (don't copy whole file)

[+] End of search -- offset=1210982 (1.2 MB)
True

But if you come across any samples of interest or find any tooling that would be better to integrate with, I'd be happy to investigate! 😀

cccs-kevin commented 1 year ago

@scottpas thoughts?

cccs-rs commented 1 year ago

Going to close issue for now. If there's still interest in this, feel free to reopen 😀