erocarrera / pefile

pefile is a Python module to read and work with PE (Portable Executable) files
MIT License
1.88k stars 522 forks source link

Discrepancy between YARA and pefile while computing imphash #384

Closed plusvic closed 10 months ago

plusvic commented 1 year ago

While investigating the bug report https://github.com/VirusTotal/yara/issues/1993, I was able to verify that there's a discrepancy between YARA and pefile in the way they treat DLL names that contain colon characters (:). YARA accepts DLL names containing colons, while pefile rejects them.

There are certain PE files with DLL names in the import section that have full paths, like:

C:\Windows\System32\ntdll.dll C:\Windows\System32\KernelBase.dll

When computing the imphash, YARA uses such DLL names (after striping the extension), but pefile replace them with the *invalid* string. If the PE file imports NtWriteVirtualMemory from ntdll.dll the string hashed by YARA is c:\windows\system32\ntdll.ntwritevirtualmemory, versus *invalid*.ntwritevirtualmemory in pefile.

This also affects the information printed by pefile about import descriptors:

[IMAGE_IMPORT_DESCRIPTOR]
0x10F92C   0x0   OriginalFirstThunk:            0x115824
0x10F92C   0x0   Characteristics:               0x115824
0x10F930   0x4   TimeDateStamp:                 0x0        [Thu Jan  1 00:00:00 1970 UTC]
0x10F934   0x8   ForwarderChain:                0x0
0x10F938   0xC   Name:                          0x117ABE
0x10F93C   0x10  FirstThunk:                    0x115ED8

*invalid*.ZwWriteVirtualMemory Hint[0]

Notice the *invalid* text before ZwWriteVirtualMemory

Example files are:

98a4d17d6dee54f9242c704af627da853d978d6d37738f875d08ea0e7eaca373 cf39a14a2dc1fe5aa487b6faf19c63bc97103db670fa24c62832895e3002eca2 ce0c2c8063be7c98c631ed1bdf758ce017499899a5009df93b0087fc36afbefc

I also tracked down the changes in YARA to investigate how it got to admit such DLL names and found this thread: https://github.com/VirusTotal/yara/issues/1501

I did some more research about what's actually allowed in the PE files. Here is a little test project that brute-forces EXE's import directory and checks whether such exe will actually run. Complete source code is included.

Result: Any character including and above space (0x20) is allowed. The only exceptions are characters that are invalid for file names in Windows, which are "*<>?|. While they still can be present in the import directory, such module can never be present in Windows, so we can treat them as invalid.

The above applies to slash, backslash (these form a relative path in a subdirectory, which is allowed in the import directory) and colon (which forms a file name with an Alternate Data Stream test:file.dll, which is allowed as well).

I ran a retrohunt job in VirusTotal looking for files containing a colon in an imported DLL name, and only found around 9800 files with this characteristic, most of them similar to ce0c2c8063be7c98c631ed1bdf758ce017499899a5009df93b0087fc36afbefc. So, they are relatively rare.

I was considering changing YARA's behaviour to mimic pefile, but in this particular case YARA's behaviour may be more adequate. What do you think?

erocarrera commented 10 months ago

Hi Victor!

Thanks for the detailed issue and apologies for the late reply.

I can't see any drawback in adding the colon to the list of valid characters. It is the goal of pefile to follow as closely as possible what the OS does. IMHO Yara is doing the right thing and it's perfile the one that should follow. I've committed a change that will make this change and will go out in the next release.

syyoo84 commented 10 months ago

Thank you for your support. Thank you for being able to contribute to the pefile community by reporting bugs at NSHC Threat Research Lab.