Potential performance increase when tasks.get_kdbg caches its results

GoogleCodeExporter commented 9 years ago

modules.lsmod and tasks.pslist both (indirectly) call tasks.get_kdbg.

In code that utilises both modules.lsmod and tasks.pslist (e.g. threads), then 
we end up duplicating effort here. 

We could gain a performance increase here by getting tasks.get_kdbg to cache 
its results.

To demonstrate what I mean, I've attached an SVG graph generated by using 
cProfile/gprof2dot to profile method usage in the threads plugin (I'm using the 
Bob.vmem image here and no plugin specific options). Notice how both 
modules.lsmod and tasks.pslist feed into tasks.get_kdbg (green node with 56.79% 
of total time spent in it and its callees).

A simple module level global variable would surely be sufficient here for 
caching tasks.get_kdbg's results?

Original issue reported on code.google.com by carl.pulley on 8 Apr 2013 at 11:08

Merged into: #521

Attachments:

threads.svg

GoogleCodeExporter commented 9 years ago

Hiya,

Thanks for the suggestions.  Caching's a very tricky area to get right (which 
is why our caching framework is disabled by default).  Some things to consider 
here are the various use-cases of volatility.  Creating a module level global 
variable as suggested means that you can only carry out one run of volatility 
at a time (otherwise, how do you know when to reset the cache).  It's difficult 
to use volatility as a framework at the moment, but most everything is passed 
through parameters to avoid the problems that anything global causes (the 
primary thing still left over is the configuration system).

Also, some of volatility's address spaces are "live", ie the data is constantly 
changing because it's a running machine.  In that case, you might get away with 
caching the KDBG location because it's fairly long lived but caching other data 
(such as the contents of the KDBG structure) would be bad.

As such, I'd recommend a far more cautious use of caching.  Whilst it may 
provide an immediate speed-up in this one circumstance, there will be corner 
cases where it will end up causing severe headaches as people try to figure out 
exactly how the data flows through the various functions to end up at the value 
it has.  I'd suggest instead looking into a patch for any functions that 
require a kdbg location so that if it's already known, it can be provided (thus 
avoiding the repeated get_kdbg call).

Original comment by mike.auty@gmail.com on 8 Apr 2013 at 11:17

Added labels: Priority-Low, Type-Enhancement
Removed labels: Priority-Medium, Type-Defect

GoogleCodeExporter commented 9 years ago

I've attached another profile graph that shows what happens to the performance 
when we replace tasks.get_kdbg with the following code (we shave about 25% off 
the time spent in this method when using threads):

kdbg_cache = {}
def get_kdbg(addr_space):
    """A function designed to return the KDBG structure from
    an address space. First we try scanning for KDBG and if
    that fails, we try scanning for KPCR and bouncing back to
    KDBG from there.

    Also note, both the primary and backup methods rely on the
    4-byte KDBG.Header.OwnerTag. If someone overwrites this
    value, then neither method will succeed. The same is true
    even if a user specifies --kdbg, because we check for the
    OwnerTag even in that case.
    """
    global kdbg_cache
    if addr_space in kdbg_cache:
      return kdbg_cache[addr_space]

    kdbgo = obj.VolMagic(addr_space).KDBG.v()

    kdbg = obj.Object("_KDDEBUGGER_DATA64", offset = kdbgo, vm = addr_space)

    if kdbg.is_valid():
        kdbg_cache[addr_space] = kdbg
        return kdbg

    # Fall back to finding it via the KPCR. We cannot
    # accept the first/best suggestion, because only
    # the KPCR for the first CPU allows us to find KDBG.
    for kpcr_off in obj.VolMagic(addr_space).KPCR.generate_suggestions():

        kpcr = obj.Object("_KPCR", offset = kpcr_off, vm = addr_space)

        kdbg = kpcr.get_kdbg()

        if kdbg.is_valid():
            kdbg_cache[addr_space] = kdbg
            return kdbg

    kdbg_cache[addr_space] = obj.NoneObject("KDDEBUGGER structure not found using either KDBG signature or KPCR pointer")
    return kdbg_cache[addr_space]

Original comment by carl.pulley on 8 Apr 2013 at 11:21

Attachments:

threads.svg

GoogleCodeExporter commented 9 years ago

Ah, I thought there had to be a good reason for this not being done before!

I'll follow up on your suggestion wrt looking at the calling codes usage of 
get_kdbg.

Original comment by carl.pulley on 8 Apr 2013 at 11:28

GoogleCodeExporter commented 9 years ago

A better approach then might be be to memoise the function calls (e.g. using 
decorators). 

This would avoid issues related to global variable usage and promote a more 
side-effect free coding approach.

With static memory images, there should be plenty of potential for speedup 
gains using memoisation. With live images, I can see that they'd be all sorts 
of issues if memoisation couldn't be switched off. A decoration library could 
certainly be switched on and off on demand.

How does that sound as a generic approach to dealing with such performance 
issues?

Original comment by carl.pulley on 8 Apr 2013 at 12:03

GoogleCodeExporter commented 9 years ago

For information purposes: using functools32.lru_cache (see 
https://pypi.python.org/pypi/functools32) to decorate tasks.get_kdbg, I can 
verify that a similar performance boost is obtained.

Original comment by carl.pulley on 8 Apr 2013 at 12:17

GoogleCodeExporter commented 9 years ago

With a little playing around and thinking about generally using memoisation, I 
can see that (for example) iterators need to be handled carefully. Naively 
adding in uses of functools32.lru_cache to the Volatility code base (e.g. upon 
the _EPROCESS get_load_modules method) can cause incorrect results to be 
generated (as the old, but depleted, iterator is returned).

Applying memoisation to get_kdbg (for static images) works since it is 
returning a concrete object.

Looking at threads and how it implicitly uses get_kdbg (via taskmods.dlllist, 
tasks.pslist and modules.lsmods), other changes don't look as natural as simply 
decorating get_kdbg with something akin to lru_cache?

Original comment by carl.pulley on 8 Apr 2013 at 7:13

GoogleCodeExporter commented 9 years ago

Carl, the volatility caching system is a memoisation framework which attempts 
to be generic with special support for iterators etc. However, as Mike Auty 
mentioned it turned out to be quite a bit more tricky than at first appears, 
mainly because of the persistence of the cache and how to invalidate it.

The approach I took in the tech preview branch was to have a session object 
which is created when the interactive shell starts up. Then things can be 
cached in the session object (e.g. kdbg location etc) and can then be reused by 
other plugins. This works very well because the interactive session is long 
lived between executing different plugins. This approach does not work well in 
the current trunk model where each plugin runs as a single shot binary which 
terminates (necessitating the cache to be persistent leading to cache 
invalidation issues).

Original comment by scude...@gmail.com on 11 May 2013 at 8:01

GoogleCodeExporter commented 9 years ago

Original comment by mike.auty@gmail.com on 18 Feb 2015 at 6:54

Changed state: Duplicate

TimsterMon / volatility

Potential performance increase when tasks.get_kdbg caches its results #405