decalage2 / olefile

olefile is a Python package to parse, read and write Microsoft OLE2 files (also called Structured Storage, Compound File Binary Format or Compound Document File Format), such as Microsoft Office 97-2003 documents, vbaProject.bin in MS Office 2007+ files, Image Composer and FlashPix files, Outlook messages, StickyNotes, several Microscopy file formats, McAfee antivirus quarantine files, etc.
http://www.decalage.info/olefile
Other
225 stars 76 forks source link

Lazy parsing #71

Open decalage2 opened 7 years ago

decalage2 commented 7 years ago

For now, olefile reads and parses most of the OLE structures (header, FAT, directory, miniFAT) right away when creating an OleFileIO object. It fails with an exception whenever any of those structures has an issue.

When handling malformed files such as malicious documents, it would be better to only read the header at object creation, and then read/parse the other structures only when required. It would then be possible to access header information even if the other parts are incorrect.

Drawback: some applications may rely on the old behaviour. And it means that all data access must be done through methods, no direct access to attributes anymore.