Open plusvic opened 4 months ago
def parse_rich_header(self):
"""Parses the rich header
see http://www.ntcore.com/files/richsign.htm for more information
Structure:
00 DanS ^ checksum, checksum, checksum, checksum
10 Symbol RVA ^ checksum, Symbol size ^ checksum...
...
XX Rich, checksum, 0, 0,...
"""
All bytes between "DanS" (inclusive) and "Rich" (exclusive) are XORed with the checksum, but nothing else is, in particular the padding after the checksum is not.
The checksum should probably be obtained as the four bytes after "Rich"; then the following better check could be done:
if data[0] ^ checksum != DANS or data[1] != checksum or data[2] != checksum or data[3] != checksum:
return None
The Rich header is not documented, but Microsoft being the originator does use the above format, so it is useful. I have done a PR #402 to make it a warning.
Also, is the file you reference WIMA_SFX.EXE?
@j-t-1 yes, the file 043066108b68b30fc2c475eae8edfafc080be7d451600eaa283d2c750bddbceb
is WIMA_SFX.EXE.
Thanks. I think #402 will fix this, as it takes the key directly from the key portion of the rich header, thus not relying on the padding being zeroes. Does #402 make sense in general and also meet your specific needs?
I think #402 is ok.
pefile
assumes that the 12 bytes following theDanS
tag are actually three copies of the 32-bits XOR key used for encrypting the rich header, and it does some validation based in this assumption:https://github.com/erocarrera/pefile/blob/d86b96e4b33c41ebd1db7ca5b4517e94d567a260/pefile.py#L3429-L3430
The origin of this assumption seems to be this article: http://www.ntcore.com/files/richsign.htm, and YARA had this same issue. However, while rewriting the parsing logic in YARA-X I came across this other article that suggests that those 12-bytes are actually padding: https://bytepointer.com/articles/the_microsoft_rich_header.htm.
In YARA-X I treated those bytes as padding, and now, while comparing YARA and YARA-X results side-by-side, I found that file
043066108b68b30fc2c475eae8edfafc080be7d451600eaa283d2c750bddbceb
proves that the padding theory is correct. The padding in that file was not initially filled with zeroes, therefore the 12 bytes after theDanS
tag don't contain copies of the XOR key.I had to fix this same issue in YARA: https://github.com/VirusTotal/yara/commit/4793b49dd79387aa4f2de98bef6cae8d43128d1c