oletools - python tools to analyze MS OLE2 files (Structured Storage, Compound File Binary Format) and MS Office documents, for malware analysis, forensics and debugging.
Describe the bug
While running this piece of code against a xlsm file (4.4MB of size)
xml_parser = ooxml.XmlParser(filepath)
for relationship, target in oleobj.find_external_relationships(xml_parser):
<do stuff>
I noticed that the execution was stuck on the find_external_relationships call, while my RAM usage was increasing continously. I had to kill the python process after a 15 GB of RAM increase because it was starting to swap.
After a bit of inspection I noticed that something was happening during the parsing of the subfile xl/pivotCache/pivotCacheRecords33.xml in the iter_xml call of the XmlParser , which effectively is really heavy when unzipped.
$ ll sample.xlsm
-rw-rw-r-- 1 user user 4,4M nov 24 16:51 sample.xlsm
$ du -sh unzippedsample
300M unzippedsample
$ $ ll unzippedsample/xl/pivotCache/pivotCacheRecords33.xml
-rw-rw-r-- 1 user user 167M gen 1 1980 unzippedsample/xl/pivotCache/pivotCacheRecords33.xml
I guess there's some kind of in-memory storage of the elemets coming from this parsing somewhere that is causing this high RAM usage, but it's just a guess, unfortunately I couldn't spend more time in debugging the issue. I'll update the thread if I'll discover something more.
File/Malware sample to reproduce the bug / How To Reproduce the bug
The sample that was causing the issue comes from a customer, so I can't share it with you. But I think it could be reproduced building some kind of heavy xls with large data in some subfile..
Affected tool: ooxml, oletools
Describe the bug While running this piece of code against a xlsm file (4.4MB of size)
I noticed that the execution was stuck on the
find_external_relationships
call, while my RAM usage was increasing continously. I had to kill the python process after a 15 GB of RAM increase because it was starting to swap. After a bit of inspection I noticed that something was happening during the parsing of the subfilexl/pivotCache/pivotCacheRecords33.xml
in theiter_xml
call of theXmlParser
, which effectively is really heavy when unzipped.I guess there's some kind of in-memory storage of the elemets coming from this parsing somewhere that is causing this high RAM usage, but it's just a guess, unfortunately I couldn't spend more time in debugging the issue. I'll update the thread if I'll discover something more.
File/Malware sample to reproduce the bug / How To Reproduce the bug The sample that was causing the issue comes from a customer, so I can't share it with you. But I think it could be reproduced building some kind of heavy xls with large data in some subfile..
Version information: