AFLplusplus / LibAFL

Advanced Fuzzing Library - Slot your Fuzzer together in Rust! Scales across cores and machines. For Windows, Android, MacOS, Linux, no_std, ...
Other
2.03k stars 319 forks source link

Optimization for memory usage (llmp maps, messaging) for high amount of fuzzers (clients) #317

Closed marcinguy closed 2 years ago

marcinguy commented 3 years ago

Is your feature request related to a problem? Please describe.

Having a several dozen of cores server I have noticed that running more than, in my case, 50 fuzzers (80 GB ram, 64 cores, 256 threads server) causes either OOM (broker gets killed along with some fuzzer clients) or broker crash (too many messages to process?)

Broker crashes here when I reduce map size to small maps, seems like due to too many messages:

https://github.com/AFLplusplus/LibAFL/blob/939784d5121abc57650ce8eb094c399dc551912e/libafl/src/bolts/llmp.rs#L2263

Broker gets killed after a while, along with other clients (OOM) when more than 50 fuzzers

With 50 fuzzer it seem to have a dozen gigs of free Ram

total        used        free      shared  buff/cache   available
Mem:             78          36          13          12          28          28
Swap:             1           0           1

Seems to run stable.

With 100, 150, 200, 255 it OOMs or broker crashes sooner or later (minutes to hours)

Describe the solution you'd like

Make clients don't send that many messages that crashes Broker when using non default 256 mb maps i.e small maps

Somehow make Fuzzer Client use less memory (I guess this is already achieved via small maps, configuration of it)

Describe alternatives you've considered

Tried small maps, which increased message amount

Additional context

I can try to optimize map size/message amount by adjusting it from 256 to 128, 64 etc. Maybe this would allow to double, quatrify etc Fuzzer Clients amount

Also, Tried setting this to 1, didn't help

https://github.com/AFLplusplus/LibAFL/blob/5a722994acb69d0f03bce101b29ea14681d1d8b3/libafl/src/bolts/llmp.rs#L102

Also this to 1 ms, didn't help

https://github.com/AFLplusplus/LibAFL/blob/d8ef1dd90abcbadd3d63c17be9ae669eead9241f/libafl/src/events/llmp.rs#L149

marcinguy commented 3 years ago

The same harness works very well with 8 Fuzzer Clients on 8 core server with 32 Gb ram (with both default llmp maps, small maps)

Memory usage

            total        used        free      shared  buff/cache   available
Mem:          31799       10787        2520        1462       18491       19084
Swap:          1023         838         185
s1341 commented 3 years ago

I too have encountered memory consumption issues when running a large number of fuzz nodes. I'd be grateful for (and happy to review and test) any PRs.

marcinguy commented 3 years ago

FYI after 24h 50 fuzzers were spinning on 80 GB server. Next 24 hours only 25

Will monitor it. But I think/hope it will remain like this.

Crashed fuzzers got this message:

thread '<unnamed>' panicked at 'Fuzzer-respawner: Storing state in crashed fuzzer instance did not work, no point to spawn the next client! (Child exited with: 9)', /home/user/LibAFL-latest/libafl/src/events/llmp.rs:864:21
domenukk commented 3 years ago

Yes, likely oom related, this happens for example if the child gets killed by the os without catchable signal.

marcinguy commented 3 years ago

Yes, this is all OOM related.

Some notes.

You can use cgroups to limit memory usage

recidivm could be useful to estimate program virtual memory usage

32 bit binaries have bugs using the fuzzer (on big map you can only attach dozen then it errors out, adjusting the map help to attach more)

32 bit binaries are much bigger than 64 bits

On VM and bare metal 32 bit binaries are way slower (why?)

ps_mem is a nice tool to examine memory usage

After 12 hrs run

398.8 MiB + 125.5 KiB = 398.9 MiB   fuzzer_libxml2_noasan_broker [217779]
404.9 MiB + 295.5 KiB = 405.2 MiB   fuzzer_libxml2_noasan_broker [215550]
401.4 MiB +   4.0 MiB = 405.4 MiB   fuzzer_libxml2_noasan_broker [205428]
405.1 MiB + 453.5 KiB = 405.5 MiB   fuzzer_libxml2_noasan_broker [216800]
407.5 MiB + 307.5 KiB = 407.8 MiB   fuzzer_libxml2_noasan_broker [216175]
404.4 MiB +   3.9 MiB = 408.3 MiB   fuzzer_libxml2_noasan_broker [204423]
407.3 MiB +   1.5 MiB = 408.9 MiB   fuzzer_libxml2_noasan_broker [210015]
404.6 MiB +   4.3 MiB = 408.9 MiB   fuzzer_libxml2_noasan_broker [203458]
408.8 MiB + 274.5 KiB = 409.1 MiB   fuzzer_libxml2_noasan_broker [217292]
408.7 MiB + 738.5 KiB = 409.5 MiB   fuzzer_libxml2_noasan_broker [213211]
408.9 MiB +   2.4 MiB = 411.3 MiB   fuzzer_libxml2_noasan_broker [208499]
409.8 MiB +   1.5 MiB = 411.4 MiB   fuzzer_libxml2_noasan_broker [209846]
411.4 MiB + 393.5 KiB = 411.7 MiB   fuzzer_libxml2_noasan_broker [215359]
409.6 MiB +   3.0 MiB = 412.5 MiB   fuzzer_libxml2_noasan_broker [211446]
412.1 MiB + 866.5 KiB = 412.9 MiB   fuzzer_libxml2_noasan_broker [212659]
414.2 MiB + 213.5 KiB = 414.5 MiB   fuzzer_libxml2_noasan_broker [217487]
415.5 MiB + 146.5 KiB = 415.7 MiB   fuzzer_libxml2_noasan_broker [216870]
421.9 MiB + 128.5 KiB = 422.1 MiB   fuzzer_libxml2_noasan_broker [217636]
422.0 MiB + 557.5 KiB = 422.6 MiB   fuzzer_libxml2_noasan_broker [213713]
421.4 MiB +   2.3 MiB = 423.7 MiB   fuzzer_libxml2_noasan_broker [207253]
415.5 MiB +   8.5 MiB = 424.0 MiB   fuzzer_libxml2_noasan_broker [193133]
420.8 MiB +   5.1 MiB = 425.9 MiB   fuzzer_libxml2_noasan_broker [199426]
424.9 MiB +   2.1 MiB = 427.0 MiB   fuzzer_libxml2_noasan_broker [209765]
416.6 MiB +  10.6 MiB = 427.2 MiB   fuzzer_libxml2_noasan_broker [191686]
423.6 MiB +  11.3 MiB = 434.9 MiB   fuzzer_libxml2_noasan_broker [187469]
429.2 MiB +   9.1 MiB = 438.3 MiB   fuzzer_libxml2_noasan_broker [191656]
427.9 MiB +  12.4 MiB = 440.3 MiB   fuzzer_libxml2_noasan_broker [187356]
428.7 MiB +  13.1 MiB = 441.8 MiB   fuzzer_libxml2_noasan_broker [186939]
444.2 MiB + 334.5 KiB = 444.6 MiB   fuzzer_libxml2_noasan_broker [214894]
445.3 MiB + 809.5 KiB = 446.1 MiB   fuzzer_libxml2_noasan_broker [213000]
453.0 MiB +   7.5 MiB = 460.5 MiB   fuzzer_libxml2_noasan_broker [195286]
460.8 MiB + 249.5 KiB = 461.0 MiB   fuzzer_libxml2_noasan_broker [217528]
462.5 MiB +   3.4 MiB = 466.0 MiB   fuzzer_libxml2_noasan_broker [203741]
463.3 MiB +   4.0 MiB = 467.4 MiB   fuzzer_libxml2_noasan_broker [203214]
428.5 MiB +  39.6 MiB = 468.1 MiB   fuzzer_libxml2_noasan_broker [172821]
469.3 MiB +   8.2 MiB = 477.5 MiB   fuzzer_libxml2_noasan_broker [192937]
491.8 MiB +   3.7 MiB = 495.4 MiB   fuzzer_libxml2_noasan_broker [202383]
501.2 MiB + 192.5 KiB = 501.4 MiB   fuzzer_libxml2_noasan_broker [217653]
436.1 MiB +  80.0 MiB = 516.1 MiB   fuzzer_libxml2_noasan_broker [158913]
549.0 MiB + 176.5 KiB = 549.1 MiB   fuzzer_libxml2_noasan_broker [217244]
592.1 MiB +   2.1 MiB = 594.2 MiB   fuzzer_libxml2_noasan_broker [204922]
592.8 MiB +   2.8 MiB = 595.6 MiB   fuzzer_libxml2_noasan_broker [201315]
599.1 MiB +   7.7 MiB = 606.8 MiB   fuzzer_libxml2_noasan_broker [187959]
608.0 MiB + 131.5 KiB = 608.1 MiB   fuzzer_libxml2_noasan_broker [216712]
613.5 MiB +   2.0 MiB = 615.4 MiB   fuzzer_libxml2_noasan_broker [203154]
622.5 MiB +   7.5 MiB = 630.0 MiB   fuzzer_libxml2_noasan_broker [186527]
673.3 MiB +   1.2 MiB = 674.5 MiB   fuzzer_libxml2_noasan_broker [204792]
674.0 MiB +   1.2 MiB = 675.1 MiB   fuzzer_libxml2_noasan_broker [205772]
676.1 MiB +   1.9 MiB = 678.0 MiB   fuzzer_libxml2_noasan_broker [199879]
  1.1 GiB +  24.5 MiB =   1.1 GiB   fuzzer_libxml2_asan_nobroker_big [158235]
594.1 MiB + 649.5 MiB =   1.2 GiB   fuzzer_libxml2_noasan_broker [153242]
 13.0 GiB + 817.5 MiB =  13.8 GiB   fuzzer_libxml2_noasan_broker [153055]
---------------------------------
                         38.9 GiB

After start each nonasan fuzzer was ca 1 GB, now (after 12 hrs) you can see one is 13 GB and other below 1GB. Any explanation for this @domenukk @andreafioraldi ?

Same harness for more the 48 hrs on 8 core box with 32 GB ram

616.4 MiB + 619.4 MiB =   1.2 GiB   fuzzer_libxml2_asan [1141122]
  1.4 GiB +   1.4 GiB =   2.8 GiB   fuzzer_libxml2_asan [1697422]
  1.4 GiB +   1.4 GiB =   2.8 GiB   fuzzer_libxml2_asan [1601844]
  1.6 GiB +   1.6 GiB =   3.2 GiB   fuzzer_libxml2_asan [1694428]
  1.7 GiB +   1.7 GiB =   3.4 GiB   fuzzer_libxml2_asan [1690697]
  1.7 GiB +   1.7 GiB =   3.4 GiB   fuzzer_libxml2_asan [1690863]
  1.7 GiB +   1.7 GiB =   3.5 GiB   fuzzer_libxml2_asan [1686704]
  1.8 GiB +   1.8 GiB =   3.6 GiB   fuzzer_libxml2_asan [1691961]
  1.9 GiB +   1.9 GiB =   3.8 GiB   fuzzer_libxml2_asan [1683781]

BTW ASAN can slow down the binary ca 70% and cause memory usage increase up to 2x 3x ..... in my case speed is similar (weird?) and memory usage increased.

Maybe the above helps somebody.

s1341 commented 3 years ago

I'm also seeing increasing memory usage on both asan and noasan over time. I think it has something to do with corpus size but I've not yet managed to put my finger on the issue.

marcinguy commented 3 years ago

One more that could be relevant

On 80 GB box

Running 50 fuzzers with 64 Mb map runs smoothly (all cores green, no syscalls, kernel threads from almost start) Running 50 fuzzers with 256 Mb map (70% red in fuzzers and 30% green after start and after). A lot of kernel threads.

This is big box, maybe related to CPU/cores.

andreafioraldi commented 3 years ago

@marcinguy can you share your libxml2 fuzzer? A zip here or drop a mail to andreafioraldi@gmail.com I'm going to debug this issue

marcinguy commented 3 years ago

Whatever this is, it is increasing. @s1341

After 64hrs

Shared mem total is at ca. 38 Gb

ipcs -m|awk '{ print $5}'|awk '{a+=$0}END{print a}'
38654705664

Fuzzer (I assume broker since it is the lowest PID) is at 42 Gb Ram usage.

21.0 GiB +  21.0 GiB =  42.0 GiB       fuzzer_libxml2_noasan_broker
marcinguy commented 3 years ago

FYI After 10 hours with 50 nonasan fuzzers and 1 asan

It grew ca. 1 Gb (shared maps/pages)

Hmmm wondering why this happens only on the one fuzzer (I tbought broker... Lowest PID id, but then I also thought Broker just brokers and does not maps shared maps/pages. Or this is not the broker? )

Other processes stay at ca. 1Gb and this one (Broker?) 43 Gb

21.5 GiB +  21.6 GiB =  43.1 GiB       fuzzer_libxml2_noasan_broker

ipcs -m|awk '{ print $5}'|awk '{a+=$0}END{print a}'
39661338624
marcinguy commented 3 years ago

FYI in all the tests before I was using clang-9 (my bad)

With a new harness/different target and clang-11, it looks way better (haven't observed the issue yet)

no-ASan vs ASan (1:3 mem usage ratio)

So suggest to use LibAFL Docker image (comes with clang-11)

andreafioraldi commented 3 years ago

Hey @marcinguy , can you try with your old setup again? Replace InMemoryCorpus with CachedOnDiskCorpus::new(..., 64) in your fuzzer first.

s1341 commented 3 years ago

@andreafioraldi your fixes resolved my issue. I think it would be nice, however, to have an InMemoryCorpus with a maximum size, so that it becomes a FIFO when maximum is reached (or something similar).

marcinguy commented 2 years ago

@s1341 @andreafioraldi Great it worked. Will check it by the next possibility. Also, try to use recent clang/llvm (now 11), noticed some issues with clang-9/llvm-9. Now with clang-11/llvm-11 it works well.

Feel free to close this issue.