erocarrera / pefile

pefile is a Python module to read and work with PE (Portable Executable) files
MIT License
1.83k stars 516 forks source link

pefile.get_imphash() function bug #382

Closed syyoo84 closed 8 months ago

syyoo84 commented 10 months ago

The imphash of the following hash seems to have been calculated incorrectly.

In that case, the get_imphash() function of the pefile module needs to be modified as follows. Reference: https://github.com/VirusTotal/yara/commit/e94ec7b62165a3424e86b82e044d047827ece752 This is the modified code

def valid_dll_name(dllname):
  if dllname == "":
      return False

  for c in dllname:
    if (not(c >= 'a' and c <= 'z') and not(c >= 'A' and c <= 'Z') and
        not(c >= '0' and c <= '9') and c != '.' and c != '_' and c != '?' and
        c != '@' and c != '$' and c != '(' and c != ')' and c != '<' and c != '>' and c != '-'):
        return False
  return True

def valid_function_name(dllname):
  if dllname == "":
      return False

  for c in dllname:
    if (not(c >= 'a' and c <= 'z') and not(c >= 'A' and c <= 'Z') and
        not(c >= '0' and c <= '9') and c != '.' and c != '_' and c != '?' and
        c != '@' and c != '$' and c != '(' and c != ')' and c != '<' and c != '>'):
        return False
  return True

def get_imphash(pe):
    """Return the imphash of the PE file.

    Creates a hash based on imported symbol names and their specific order within
    the executable:
    https://www.mandiant.com/resources/blog/tracking-malware-import-hashing

    Returns:
        the hexdigest of the MD5 hash of the exported symbols.
    """

    impstrs = []
    exts = ["ocx", "sys", "dll"]
    if not hasattr(pe, "DIRECTORY_ENTRY_IMPORT"):
        return ""
    for entry in pe.DIRECTORY_ENTRY_IMPORT:
        if isinstance(entry.dll, bytes):
            libname = entry.dll.decode().lower()
        else:
            libname = entry.dll.lower()
        parts = libname.rsplit(".", 1)

        if len(parts) > 1 and parts[1] in exts:
            libname = parts[0]

        entry_dll_lower = entry.dll.lower()

        for imp in entry.imports:
            funcname = None
            if not imp.name:
                funcname = ordlookup.ordLookup(
                    entry_dll_lower, imp.ordinal, make_name=True
                )
                if not funcname:
                    raise pefile.PEFormatError(
                        f"Unable to look up ordinal {entry.dll}:{imp.ordinal:04x}"
                    )
            else:
                funcname = imp.name
            if not valid_dll_name(libname):
                continue
            if not valid_function_name(funcname.decode()):
                continue
            if isinstance(funcname, bytes):
                funcname = funcname.decode()
            impstrs.append("%s.%s" % (libname.lower(), funcname.lower()))
    return md5(",".join(impstrs).encode()).hexdigest()
syyoo84 commented 10 months ago

".drv" in module names is ignored when calculating imphash (".drv" extension was not removed) Please also add the “.drv” extension.

syyoo84 commented 10 months ago

@erocarrera Please review the above and let us know of opinions.

erocarrera commented 9 months ago

Hello,

Firstly, I apologize for the delayed response.

Regarding your initial message, I've been unable to locate the file with the hash e024a5d1889bdf910297bee82e58c1a530da5683f1221414e0661becd67df5e9 on VT to verify the issue you've mentioned. Nonetheless, based on the Yara commit you've referenced, it appears that Yara's behavior has been modified to align with that of pefile, if I understand correctly.

As for your comment on the .drv extension, I recognize its significance. However, incorporating this extension would necessitate recalculating the imphash values for all files that have referenced a .drv file. This would be imperative for platforms like Virustotal to ensure correct data association. I generally advise against any changes that alter the imphash value, considering the vast number of professionals who rely on it and maintain databases with millions of files for which it has been determined.

syyoo84 commented 9 months ago

Hello,

Firstly, I apologize for the delayed response.

Regarding your initial message, I've been unable to locate the file with the hash e024a5d1889bdf910297bee82e58c1a530da5683f1221414e0661becd67df5e9 on VT to verify the issue you've mentioned. Nonetheless, based on the Yara commit you've referenced, it appears that Yara's behavior has been modified to align with that of pefile, if I understand correctly.

As for your comment on the .drv extension, I recognize its significance. However, incorporating this extension would necessitate recalculating the imphash values for all files that have referenced a .drv file. This would be imperative for platforms like Virustotal to ensure correct data association. I generally advise against any changes that alter the imphash value, considering the vast number of professionals who rely on it and maintain databases with millions of files for which it has been determined.

e024a5d1889bdf910297bee82e58c1a530da5683f1221414e0661becd67df5e9 For issues related to hash, you need to add the module name check function in iat in pefile. You can refer to the modified code above. I can't upload the sample from vti, so I'll share a picture of the IAT table capture. https://ibb.co/CbsrnQL

I agree with your comment because removing the ".drv" extension requires recalculating the imphash. Thank you so much for your comment.

syyoo84 commented 8 months ago

thank you for your support!