Open percepo opened 8 months ago
This matter is more convoluted than first assumed Some observations for future reference:
Big amounts of Memory (~500MB on 100k concurrently existing FuzzResults) are occupied when many FuzzResults are created simultaneously
Observing and confirming that del for all all of these 100k FuzzResults is executed still does not display a decrease in OS-observed occupied RAM by the process
manually calling gc.collect() after the observed del calls does not have any particularly useful effect
The cache only seems to occupy a few MB of that
Creating ~100k concurrent FuzzResults, deleting them, and creating 50k again will occupy more (around 70MB increase) than the initially ~100k of FuzzResults occupied in occupied process memory
A lot of FuzzResults created in total via the SeedQueue with a long wordlist, without existing at the same time (in contrast to plugins that create many in one batch), is not suspicious. RAM was at slightly below 300MB on 1 million processed FuzzResults. While even here I think there is plenty room for improvement, general creation of FuzzResults should not be the culprit, but rather concurrently existing FuzzResults coupled with the fact that once allocated, it seems to not get released by the process anymore.
What could cause this behavior? What other objects could occupy the memory in this way?
This issue will probably be split up into two solution at this point.
Research has shown that Python generally really does not like freeing allocated memory for the OS once allocated. This means that spikes of allocated memory will usually simply remain in use by the process.
To deal with this, plugins should not immediately create FuzzResult objects. There will probably be a bigger update refactoring the structure of having objects as small as possible before actually being transformed into a FuzzResult with all its attributes in the HttpQueue. This should help, as the bottleneck occurs in the lane from Plugin -> RedirectQueue -> HttpQueue If only the objects after the HttpQueue are real FuzzResults, we shouldnt have spikes of memory allocation as high as right now.
Additionally, it should be traced which objects grow on long runtimes, excluding the factor of concurrently existing FuzzResult. 300MB on 1million processed results is still weird, given it should only be the cache, and some stats mostly containing some integer counters that persist and grow.
Lastly, the most convoluted point and postponed indefinitely, the observation of having a memory spike (e.g. 100k FuzzResults), removing those 100k FuzzResults, and creating 50k will lead to more occupied memory than the initial spike. Debugging and fixing this seems to hold the least promise, but with more info in time this point may be tackled as well.
Note: Considering that Backfeed objects are simply deepcopied FuzzResults, it is not unrealistic to make an assumption that it would easily lead to high amounts of memory occupied when hypothetical 50k generated plugin requests generated from results that contain 50kb of data would lead to lots of occupied memory.
The FuzzRequest/Request/FuzzResult/Response interactions should be refactored from the ground up. Request and Response should contain all information directly attributed to the HTTP counterparts, and the Fuzz-versions of them should track all meta-information unique to wenum (e.g. how many retries have been made, or what the results of the plugins are) that can not be parsed from the HTTP Request/Response information alone. Additionally, FuzzResult should become FuzzResponse A FuzzResponse should only exist after the request has been sent out. All of this should also end up in fixing the RAM issue
And it cant be dealt with in the same manner as the SeedQueue. Simply slowing down the PluginQueue when there are too many already queued wont fix it. This is due to the fact that the requests are already buffered, and even if executed slower, they will still reach the amount of buffered requests before they are truly discarded at the end of everything. I can think of 2 approaches to fixing this: