decalage2 / olefile

olefile is a Python package to parse, read and write Microsoft OLE2 files (also called Structured Storage, Compound File Binary Format or Compound Document File Format), such as Microsoft Office 97-2003 documents, vbaProject.bin in MS Office 2007+ files, Image Composer and FlashPix files, Outlook messages, StickyNotes, several Microscopy file formats, McAfee antivirus quarantine files, etc.
http://www.decalage.info/olefile
Other
231 stars 77 forks source link

Continuous fuzzing by way of OSS-Fuzz #149

Open DavidKorczynski opened 2 years ago

DavidKorczynski commented 2 years ago

Hi,

I was wondering if you would like to integrate continuous fuzzing by way of OSS-Fuzz? In this PR https://github.com/google/oss-fuzz/pull/8105 I do exactly that, namely created the necessary logic from an OSS-Fuzz perspective.

Essentially, OSS-Fuzz is a free service run by Google that performs continuous fuzzing of important open source projects. The only expectation of integrating into OSS-Fuzz is that bugs will be fixed. This is not a "hard" requirement in that no one enforces this and the main point is if bugs are not fixed then it is a waste of resources to run the fuzzers, which we would like to avoid.

If you would like to integrate, the only thing I need is as list of email(s) that will get access to the data produced by OSS-Fuzz, such as bug reports, coverage reports and more stats. Notice the emails affiliated with the project will be public in the OSS-Fuzz repo, as they will be part of a configuration file.

In the event your unfamiliar with fuzzing, then it's a technique used to automate test case generation. It's been used frequently over the last decade to analyse projects in memory unsafe languages to catch memory corruption issues, but is now moving into supporting memory safe languages (hence this PR). In the Python world, the expected bugs to be found at the moment is uncaught exceptions. I'm happy to answer any questions you may have!

decalage2 commented 2 years ago

Hi David, this looks like a good idea. Where can I see the zip file containing the corpus of files used for fuzzing? Also I see that the script to fuzz olefile only opens each data file but does not do any further action. Maybe it would be better to call more olefile methods, for example get the list of streams, and open/read each of them? And also read all the OLE properties (as this part of the code is less tested).

DavidKorczynski commented 2 years ago

Where can I see the zip file containing the corpus of files used for fuzzing?

You will get this on https://oss-fuzz.com once the integration has happened

Also I see that the script to fuzz olefile only opens each data file but does not do any further action. Maybe it would be better to call more olefile methods, for example get the list of streams, and open/read each of them? And also read all the OLE properties (as this part of the code is less tested).

Definitely. This first fuzzer has some findings so we could start with that and you'll experience OSS-Fuzz. What we can also do is move the fuzzers upstream to here in the olefile repository and then you can add/modify the fuzzers however you like.