PoiScript / orgize

A Rust library for parsing org-mode files.
https://poiscript.github.io/orgize/
MIT License
289 stars 35 forks source link

Add a simple unicode fuzz test as a standalone example. #23

Open calmofthestorm opened 4 years ago

calmofthestorm commented 4 years ago

This adds a multithreaded fuzz test that will include all Unicode scalar values, along with some basic structural formatting like headlines.

I ran it for an hour or so on 0.8.4 and found no problems.

The generative model could be improved a great deal to generate more bits and pieces of valid Org structure (another approach would be to take valid org files as input and introduce mutations), but this gives me more confidence.

PoiScript commented 4 years ago

To be honest, I'm still wondering if it's really necessary....

Could you please describe what's difference between cargo fuzz and your own implementation?

calmofthestorm commented 4 years ago

We can close this if you like. Running it has given me more confidence, so it has served its purpose to an extent.

I've been having issues with the fuzzer immediately dying with a bug in jetscii every time. I tried to troubleshoot this, and was able to reproduce it in a simple example not involving orgize, but only when I specify a version in Cargo.toml. When I tried to build instead from git, I was unable to reproduce the ASAN violation, despite removing Cargo.lock and target/. I assume there's something going on I don't understand involving rust toolchain or something. Here's the example, if you are interested:

use jetscii::{bytes, BytesConst};

fn main() {
    lazy_static::lazy_static! {
        static ref PRE_BYTES: BytesConst =
            bytes!(b'@');
    }

    PRE_BYTES.find(b" ");
}

Run with RUSTFLAGS="-Z sanitizer=address" cargo +nightly run --target x86_64-unknown-linux-gnu of course.

I'm also not sure how much time cargo fuzz will spend generating invalid unicode in order to generate larger length strings. I know it tries to explore branching, but that seems likely to bias toward short strings and/or generate a lot of invalid inputs which are quickly skipped.

I was working on adding an ability to take as input personal org files and break them up into fragments. I have it in a different fuzz test but need to port it over, which may or may not happen though.

I like mine checks writing org and html as well, but I could easily add a cargo fuzz target for that.

Overall it's not clear to me what concretely this adds over cargo fuzz. I have found it valuable to have written, but it's hard to make a case that it should be committed.