Caching generators that contain generators store empty lists

GoogleCodeExporter commented 8 years ago

I found this whilst examing memdump, which returns a generated list of (pid, 
task, pages) where pages is a generator (get_available_pages).

What happens is that the decorator calls generate on the outer generator.  The 
individual results are appended to the payload and then yielded.  At the end, 
the payload is dumped.  The problem is that, due to the yield, any 
sub-generators may get traversed before the dump happens.  Then, by the time 
they come to be dumped, they return no results.

I've attached a proposed patch, which I'll apply if scudette's happy with it?  
The idea is simply to always flatten all available generators.  This will have 
side effects on functions that terminate the generator early, don't iterate 
through it completely.  I've kept flatten_generators as a method of the node so 
that blocking nodes can pass on the generators without interruption.

The only other solution I can think of is replacing each generator with a 
caching generator wrapper such that when it generates a value, the value is 
cached (and in the right place).  That doesn't strike me as a trivial piece of 
engineering, but without it certain caching operations may be much slower than 
simply running the plugin without any caching.

Original issue reported on code.google.com by mike.auty@gmail.com on 23 Aug 2010 at 12:48

Attachments:

volatility-cache-subgenerators.patch

GoogleCodeExporter commented 8 years ago

Flatenning the generators during caching will block the interactive nature of 
the plugins which is the reason we use generators in the first place. This 
means that output is not emitted until all results are generated and cached. 
This is what is currently done for the unit tests but I dont think is what we 
want to do during regular runs.

We have two options:
1) Implement the type of generator caching you mentioned - its is possible to 
do but wont be very simple as you point out.

2) Not cache those particular plugins which return complex generators - this 
can be done by not decorating their methods.

I am happy to go with 2 until we get 1 sorted out maybe for 1.5. This simply 
avoids caching those complex plugins which brings us to the same place we were 
in before we had caching anyway.

Original comment by scude...@gmail.com on 23 Aug 2010 at 5:15

GoogleCodeExporter commented 8 years ago

Ok, well, it seems impossible to wrap a function that can return delayed 
results, such that all the results are cached without generating them during 
the wrapping of the function.  We'll either need a global 
dump-all-cached-results that runs after everything interactive has happened, or 
we'll have to go through everything before returning from the cached function.

If we don't do that, then generators that are never called won't be cached, and 
something else tries to make use of the cache and asks for the generator, it'll 
get no results back.

I've attached another patch that should detect when a subgenerator's present in 
the results and dumps an empty cache object, so that complex functions are 
automatically ignored.  This gives you the benefit of caching those that may 
optionally contain a generator.  I dunno when it happens, but at least you 
don't have to think/worry/find out whether it does or not.

It might also be worth not writing the cache storage if the payload is false, 
since we do a check on returned payloads anyway before we use them, so I've 
thrown that in there too.

This seems to work ok from my testing, and speeds up memdump considerably from 
the second run onwards, because the kpcrscan/pslist is still cached (it 
would've been without the decorator, but as I say, this way we don't need to 
care).  If you could review this and let me know if it's ok, I'll apply it...  
5:)

Original comment by mike.auty@gmail.com on 23 Aug 2010 at 11:17

Attachments:

volatility-dont-cache-generators.patch

GoogleCodeExporter commented 8 years ago

Grrr, so there's a slight bug, in that if the payload happens to be a generator 
itself (although it never should be at the moment), it might get cached.  Given 
it's empty, that shouldn't affect the results because an empty payload is 
ignored, however, since the fix is so simple (move the raise outside of the 
loop, so it checks item rather than x), I'll do that if I commit this.

Original comment by mike.auty@gmail.com on 23 Aug 2010 at 11:27

GoogleCodeExporter commented 8 years ago

Ok, so here's a patch which doesn't cache objects that contain generators, in 
case they are traversed before the dumping happens.  It also ensures that 
testsuite never uses cached results, and because the results of calculate 
functions don't get used, it can fully flatten all generators without fear of 
throwing off the results.  This flattening now happens in the Testable mixin.

This also features the optimization that cache storage isn't used for payloads 
that are false/None, since these aren't used when read back anyway.  There 
should be no other gotchas with this (and I've run some test against normal 
plugins and testsuite, and they all seem to do the right thing).

Original comment by mike.auty@gmail.com on 23 Aug 2010 at 4:19

Attachments:

volatility-caching-updates.patch

GoogleCodeExporter commented 8 years ago

Ok, I committed this.  There's an interesting issue trying to set NO_CACHE 
after the command lines have been parsed, but it appears that with the current 
setup that all works fine.  Marking as Fixed.  Everyone should feel free to 
reopen if there's a problem they spot.  5:)

Original comment by mike.auty@gmail.com on 25 Aug 2010 at 7:12

Changed state: Fixed

Leor3961 / volatility

Caching generators that contain generators store empty lists #12