Plugins can still cause RAM spikes

percepo commented 8 months ago

And it cant be dealt with in the same manner as the SeedQueue. Simply slowing down the PluginQueue when there are too many already queued wont fix it. This is due to the fact that the requests are already buffered, and even if executed slower, they will still reach the amount of buffered requests before they are truly discarded at the end of everything. I can think of 2 approaches to fixing this:

Force processing request chains to the end before processing others. This would most likely cause a hit to the efficiency of the runtime and also be cumbersome to implement. But it would help, as it would lean towards reaching recursion limits and guaranteeing an end to the buffered requests, before dealing with the next ones.
Ensure that the objects are as small as possible before actually sending out the request. FuzzResult objects are created immediately, but that does not seem necessary. Not only should it be clarified which fields cause big amounts of RAM to be occupied (it cant be 10 string fields, can it?), but many of those can instead be used once the HttpQueue actually wants to send it out. This would mean that the thousands of requests buffered occupy much less RAM, and only those requests that have passed the actual sending out of the request occupy the same amount of RAM as of currently. That would probably help down trim the occupied space a lot, as I think that the clutter exists in the runtime after the requests.

percepo commented 8 months ago

This matter is more convoluted than first assumed Some observations for future reference:

Big amounts of Memory (~500MB on 100k concurrently existing FuzzResults) are occupied when many FuzzResults are created simultaneously
Observing and confirming that del for all all of these 100k FuzzResults is executed still does not display a decrease in OS-observed occupied RAM by the process
manually calling gc.collect() after the observed del calls does not have any particularly useful effect
The cache only seems to occupy a few MB of that
Creating ~100k concurrent FuzzResults, deleting them, and creating 50k again will occupy more (around 70MB increase) than the initially ~100k of FuzzResults occupied in occupied process memory
A lot of FuzzResults created in total via the SeedQueue with a long wordlist, without existing at the same time (in contrast to plugins that create many in one batch), is not suspicious. RAM was at slightly below 300MB on 1 million processed FuzzResults. While even here I think there is plenty room for improvement, general creation of FuzzResults should not be the culprit, but rather concurrently existing FuzzResults coupled with the fact that once allocated, it seems to not get released by the process anymore.

What could cause this behavior? What other objects could occupy the memory in this way?

percepo commented 8 months ago

This issue will probably be split up into two solution at this point.

Research has shown that Python generally really does not like freeing allocated memory for the OS once allocated. This means that spikes of allocated memory will usually simply remain in use by the process.

To deal with this, plugins should not immediately create FuzzResult objects. There will probably be a bigger update refactoring the structure of having objects as small as possible before actually being transformed into a FuzzResult with all its attributes in the HttpQueue. This should help, as the bottleneck occurs in the lane from Plugin -> RedirectQueue -> HttpQueue If only the objects after the HttpQueue are real FuzzResults, we shouldnt have spikes of memory allocation as high as right now.

Additionally, it should be traced which objects grow on long runtimes, excluding the factor of concurrently existing FuzzResult. 300MB on 1million processed results is still weird, given it should only be the cache, and some stats mostly containing some integer counters that persist and grow.

Lastly, the most convoluted point and postponed indefinitely, the observation of having a memory spike (e.g. 100k FuzzResults), removing those 100k FuzzResults, and creating 50k will lead to more occupied memory than the initial spike. Debugging and fixing this seems to hold the least promise, but with more info in time this point may be tackled as well.

percepo commented 8 months ago

Note: Considering that Backfeed objects are simply deepcopied FuzzResults, it is not unrealistic to make an assumption that it would easily lead to high amounts of memory occupied when hypothetical 50k generated plugin requests generated from results that contain 50kb of data would lead to lots of occupied memory.

percepo commented 8 months ago

The FuzzRequest/Request/FuzzResult/Response interactions should be refactored from the ground up. Request and Response should contain all information directly attributed to the HTTP counterparts, and the Fuzz-versions of them should track all meta-information unique to wenum (e.g. how many retries have been made, or what the results of the plugins are) that can not be parsed from the HTTP Request/Response information alone. Additionally, FuzzResult should become FuzzResponse A FuzzResponse should only exist after the request has been sent out. All of this should also end up in fixing the RAM issue

WebFuzzForge / wenum

Plugins can still cause RAM spikes #89