Some interesting fact - Githubissues

Br1m4zz commented 9 months ago

I‘m working on effect of corpus on fuzzers's code & state coverage. And I perform experiments on SGFUZZ fuzzing with live555 and openssl: Each protocol implementation is provided with 2 type of corpus:

Origin Corpus: corpus from profuzzbench
Scattered corpus: corpus from profuzzbench, but were divided into single message in one seed by different protocol message type.

I counted state coverage by SGFUZZ's STT leave node number over time and code coverage by SGFUZZ's TPC coverage.

(openssl)

(live555, the fuzzer stoped in the inflection point of line due to memory exhaustion)

It seemed strange that: Scattered corpus have more states found than origin one. I don't understand the reason.

Could you give me some explanation on the phenomenon？Is it ok to calculate code coverage by TPC coverage and state coverage by STT's leaves number?(In your paper, you said you calculate state cov by paths in the STTs and I think paths in STTs should be equal to number of Multi-forked trees’ leaves)

bajinsheng commented 9 months ago

Without looking into this corpus, I do not have a direct answer for the figures. I would suggest examining which is the code achieved by the fuzzing campaign with the "scattered corpus", and analyze why they can be covered by mutating a shorter corpus instead of a long corpus.

Is it ok to calculate code coverage by TPC coverage and state coverage by STT's leaves number?

What is the TPC? SGFuzz should be able to directly output the number of leaves of STT in the terminal, and its number should be equivalent to the number of paths.

Br1m4zz commented 9 months ago

Without looking into this corpus, I do not have a direct answer for the figures. I would suggest examining which is the code achieved by the fuzzing campaign with the "scattered corpus", and analyze why they can be covered by mutating a shorter corpus instead of a long corpus.

scattered copus contained seeds which are only single type of message. For live555, scattered copus were like: SETUP\ PLAY \ DESCRIBE etc. , and origin corpus is scheduled as sequence: DESCRIBE->SETUP->SETUP->PLAY->TEARDOWN. Screenshot 2023-12-04 204608 one seed in scattered corpus ↑ Screenshot 2023-12-04 204614 seed in origin corpus ↑

The reason why scattered corpus could reach higher TPC coverage in openssl may be that I added HEARTBEAT messege to corpus , which may trigger the block SUT dealing with this type of message. I will appreciate that if you could provide me with your code coverage scripts on openssl to verify my guess.

What is the TPC? SGFuzz should be able to directly output the number of leaves of STT in the terminal, and its number should be equivalent to the number of paths.

TPC is SGFUZZ's output in terminal cov:XXXX, which is the coverage instrument of Libfuzzer after I review the code.

And still, I don't understand why scattered corpus had more STT's tree node than origin one, which I supposed that the latter should be better

bajinsheng commented 9 months ago

I used Profuzzbench to collect the code coverage.

I have no idea why a shorter corpus has higher code coverage. I did not evaluate this situation. Just curious, is it possible that decomposed message sequences cover more error-handling code, as it increases the diversity of the first message of each sequence?

Br1m4zz commented 9 months ago

I'm not quite familiar with libfuzzer but here's my guess: libfuzzer with netdriver works in one process, maybe sgfuzz works like AFL's persistent mode: the context of some state variable may NOT be reset after the execution of one seed after connection close ? And it is true that scattered corpus' execution speed is much faster than origin corpus.

bajinsheng commented 9 months ago

If a state is not reset after closing a socket connection, I think it is probably a bug for the target system.

Br1m4zz commented 9 months ago

Okay, Thank you again for your explanations.

bajinsheng / SGFuzz

Some interesting fact #10