Open decalage2 opened 4 years ago
Maybe these commits will fit better: https://github.com/HeinleinSupport/oletools/commit/8a636acfce76dec0ef65f5145800d796fab949e3 https://github.com/HeinleinSupport/oletools/commit/1865fbda18bdfde94e3ec3c4884305f79bb1f31a https://github.com/HeinleinSupport/oletools/commit/7436ce7ff0203baf8ffcf2a7a39334f49307c75a
But your assumption is correct for now - while seeing many Emotets, I have only seen some rare files where chardet was evaluated. And then the encoding was cp1252 ;)
In VBA_Project.extract_macros(), if for any reason (e.g. malformed data) it is impossible to parse the VBA project stream to obtain information about VBA modules, all streams are checked to determine if they contain a VBA module. In that case, the encoding of the VBA source code is unknown. For now, olevba uses the cp1252 encoding, because it is the most frequently used, but this could lead to decoding errors. A solution could be to use the 3rd party package "chardet" to guess the encoding. See potential implementations:
In any case, I think chardet should not be yet another mandatory dependency, so it's better to make it optional, and to fall back to cp1252 if chardet is not installed.