Granary / granary

Dynamic binary translation framework for instrumenting the Linux kernel and its modules
Other
76 stars 6 forks source link

[QEUSTION] About leak detector #24

Open renzhengeek opened 10 years ago

renzhengeek commented 10 years ago

Hi all, I want to make leak_detector work again, which is currently removed from master.

Could someone show me the high idea of leak_dector? like: (1) what type of memory leak can be detected? (2) why do we introduce thread into leak_detector?

Thanks! :)

renzhengeek commented 10 years ago

Hi all, there is a small and detailed proble to ask you :)

I know the decripotr_type is a common interface for various watchpoints. But, I don't know the ALLOC_AND_INIT and REINIT_WATCHED_POINTERS trick. when do we should make them true or false?

    /// Specify the descriptor type to the generic watchpoint framework.
    template <typename>
    struct descriptor_type {
        typedef leak_detector_descriptor type;
        enum {
            ALLOC_AND_INIT = false,
            REINIT_WATCHED_POINTERS = false
        };  
    };  

here is the link:https://github.com/Granary/granary/blob/release/clients/watchpoints/clients/leak_detector/descriptor.h#L187

pgoodman commented 10 years ago

I'm hoping that Akshay will explain the leak_detector tool to you when he has free time.

I will admit in advance that the configuration enumeration constants for descriptors are poorly designed and unusually complex. Future versions of watchpoints in the next Granary will be better though :-D

ALLOC_AND_INIT specifies whether or not the descriptor allocation (of the counter index) and initialization (of the descriptor associated with the counter index) should be a single operation (true) or two separate operations (false).

If the value is true, then the static method allocate_and_init on the descriptor type (in this case leak_detector_descriptor) is invoked to allocate and initialize the watchpoint descriptor (https://github.com/Granary/granary/blob/master/clients/watchpoints/instrument.h#L478).

If the value is false, then first a counter index is allocated (https://github.com/Granary/granary/blob/master/clients/watchpoints/instrument.h#L510) and then the descriptor associated with the index is initialized (https://github.com/Granary/granary/blob/master/clients/watchpoints/instrument.h#L521).

REINIT_WATCHED_POINTERS tells us what to do when we try to add a watchpoint to an already watched address. For example, if you had the following:

void *ptr = ...;
add_watchpoint(ptr, ...);
add_watchpoint(ptr, ...);

Then the first add_watchpoint would add a watchpoint to ptr. If REINIT_WATCHED_POINTERS is true then the second call to add_watchpoint would ignore that ptr is already watched and add another watchpoint to ptr, effectively overwriting the first (you could have two separate descriptors, but only one index can ever be stored in a pointer). If REINIT_WATCHED_POINTERS is false then the second add_watchpoint is treated as a no-op (does nothing) as ptr is already watched.

renzhengeek commented 10 years ago

Hi pgoodman, Thanks.

May I ask, have you started working on next granary? If so, why is there no updates on github?
Or, you plan to keep next granary a private project? :)

pgoodman commented 10 years ago

There are updates; however, they are in a private repo for the time being. I will be happy to share when it's closer to being in a state where it actually does something. At the moment I am working on a virtual register system, which allows one to avoid having to manually write code that saves/restores registers, the flags, etc. That work will probably take another two or more weeks. After that I will need to implement the actual stuff that makes control-flow work. Beyond that there are several other challenges, including code cache flushing / migration.

renzhengeek commented 10 years ago

yeah, it's good news.

When/if neccessary, I would like to help doing something like test, document etc. cheer up. ;)

All best for you.

kumarak commented 10 years ago

Hello Ren, The leak_detector detects memory leak in kernel modules. I will suggest you going through the LWN article on kmemleak (https://lwn.net/Articles/187979/). The current implementation uses same algorithm to detect memory leaks in kernel modules. It also performs reachability on memory block allocated in module context from a set of rootsets.

However, there are some challenges when implementing leak detector for the kernel modules. The modules generally has limited view of the kernel and its activities. The allocated memory blocks are also shared between kernel and modules. This makes it challenging to design a leak detector for kernel modules which detects leak with low or reasonable false positive.

Behavioral watchpoints are quite helpful in designing such system. This is because they take the form of non-canonical addresses and can be easily identified. However, the challenge here is to add watchpoints with all the memory allocated in module context (allocated by the module or kernel code on behalf of module) and also identify the rootsets correctly.

The current implementation of leak detector patches all memory allocators in the kernel and based on the interaction of the kernel/modules it decides when it has to add the watchpoints. The policy used to add watchpoints are lenient because adding more watchpoints will increase your scan space but it will also reduce the false positive. Implementation also uses two different scanning policy to perform reachability analysis: light scanning and deep scanning. Light scanning only scans the rootsets collected during the program run and it gets scheduled periodically. The initial rootset includes the machine context, thread stack and all global pointers. The deep scanning is infrequent and it gets scheduled only when the detected memory leaks reaches a threshold value. During deep scanning, it scans entire heap space looking for the watchpoints. The design of deep scanning is very critical because it introduces false negative and you can only detect certain kind of leaks, but the leak detected during deep scanning are true leaks. A decision about a leak is made after both light and deep scanning.

This is a general overview of leak detector. I am not sure if it answers all your queries. Let me know if you have any other question. We can discuss it further.

renzhengeek commented 10 years ago

Hello kumarak, Thanks a lot.

Sorry for my poor context knowleges. There is still some I don't understand:

  1. What's rootsets?
  2. Is reachability the same with addressablity?
  3. why do we introduce thread into leak_detector? are they taking light scanning and deep scanning?

thanks ;)

renzhengeek commented 10 years ago

perhapse, you can help me fix this compiling error, here #25.

pgoodman commented 10 years ago

I suggest Googling (or Baidu-ing) for info on the "mark and sweep" garbage collection algorithm. That will answer some of your questions and give you a bit more intuition.

Sans intuition answer:

  1. The rootsets are the set of kernel stacks, and the sets of registers associated with the threads that own those stacks.
  2. An object is reachable if and only if it can be reached by following zero or more pointer dereferences from the rootset. That is, if the object is stack allocated, then it's rechable. If it's pointed to by something in the stack, then it's reachable. If it's pointed to by something pointed to by a register / an address on the stack, then it's reachable.
  3. The concept of thread (or task) is important for any garbage collector system, specifically because the stacks and registers of every thread form the rootset.

Light scanning is based on object staleness. I am not sure if this is how Akshay actually implemented it, but the way we have previously discussed it, staleness tracking works as follows. For every object, keep track of a counter. The value of the counter is 0 if the object has been accessed within the current epoch. A value of N > 0 says that the object has not been accessed for at least N epochs. Suppose that a value of N >= M means that an object is stale. Stale could mean that it's long-live and rarely accessed. Stale could also mean that the object has leaked. The intuition is thus: assume that staleness is associated with leaks, and so use staleness as a signal for doing more heavyweight mark & sweep analysis.

The way the staleness is implemented is that every access to a watched object moves 0 into its counter. Periodically, a thread is scheduled (the periodicity defines the length of an epoch), and that thread iterates over every watchpoint descriptor and increments the value of the counter. This thread also determines the number of descriptors whose counters have values N where N >= M. If the number of "stale" objects exceeds some threshold then the epoch thread (light scanning) schedules a full mark & sweep pass that scans the entire kernel heap (deep scanning).

renzhengeek commented 10 years ago

Thanks. I get a high insight about leak detector. But there are many details to be understood.For example,about thread_state(https://github.com/Granary/granary/blob/release/clients/watchpoints/clients/leak_detector/thread.cc#L69):

   leak_detector_thread_state *get_thread_private_info(void){
        thread_state_handle thread = safe_cpu_access_zone();
        leak_detector_thread_state *thread_state(thread->state);
        return thread_state;
    } 
  1. why should we know and handle the thread state? for a concrete case.
  2. safe_cpu_access_zone is a blank struct(https://github.com/Granary/granary/blob/release/granary/state_handle.h#L24), how can we do assignment from it to thread_state_handle structure?

All the best for you.

renzhengeek commented 10 years ago

Look at this code(https://github.com/Granary/granary/blob/release/clients/watchpoints/clients/leak_detector/state.h#L15):

        enum thread_private_state{
            NONE = 0x0ULL,
            MODULE_RUNNING,
            MODULE_EXIT,
        };  

does MODULE here refer to target mudule, granary, or something else?

renzhengeek commented 10 years ago

About leak policy(https://github.com/Granary/granary/blob/release/clients/watchpoints/clients/leak_detector/instrument.h#L90):

   /// Policy that applies to all internal app code, i.e. invoked from either
    /// an entry block or a potential exit block.
    struct leak_policy_continue : public watchpoint_leak_policy {
            granary::instrumentation_policy visit_app_instructions(
            granary::cpu_state_handle cpu,
            granary::basic_block_state &bb,
            granary::instruction_list &ls 
        ) throw();
    };  
  1. does internal app code means uninstrumented app code?
  2. what's entry block?does it means block within module init function?
  3. what's exit block?does it means block within module exit function?

All the best for you.

pgoodman commented 10 years ago

Think of safe_cpu_access_zone as documenting that the surround code executes with interrupts disabled, i.e. in a zone/region where it is safe to access CPU-private information. In Granary's implementation of thread-local storage, thread_states are stored in the kernel's struct task_struct, and to get the current task struct, we need to access CPU-private data (see implementation of kernel's current() macro).

I think that MODULE refers to an arbitrary Linux kernel module. That is, if the current thread is executing module code, then presumably it is in the MODULE_RUNNING state. So lets break this down:

The initial policy is the leak_policy_enter (https://github.com/Granary/granary/blob/release/clients/watchpoints/clients/leak_detector/instrument.h#L15). The comment here (https://github.com/Granary/granary/blob/release/clients/watchpoints/clients/leak_detector/instrument.h#L66) says that this policy applies only to the first basic block of module code invoked from kernel code. If we look at the implementation of this policy, it does the following:

  1. It instruments the code according to the leak_policy_exit policy. The purpose of this apparently unusual choice is so that if some module function is entirely contained in a single basic block, then it will be instrumented according to both the leak_policy_enter and leak_policy_exit policies.
  2. It injects a function call at the beginning of the basic block: https://github.com/Granary/granary/blob/release/clients/watchpoints/clients/leak_detector/instrument.cc#L208, which depends on https://github.com/Granary/granary/blob/release/clients/watchpoints/clients/leak_detector/instrument.cc#L165, which uses the "clean callable" function https://github.com/Granary/granary/blob/release/clients/watchpoints/clients/leak_detector/instrument.cc#L147, which is defined here: https://github.com/Granary/granary/blob/release/clients/watchpoints/clients/leak_detector/thread.cc#L105. In this function (leak_notify_thread_enter_module), we see that it accesses the thread_state and sets the module's status to MODULE_RUNNING.
  3. It returns policy_for<leak_policy_exit>. This is interesting because it says that, unless otherwise specified, instrument the targets of every non-ret control-flow instruction using the leak_policy_exit policy. This is what's called a "policy switch". If we look into leak_policy_exit::visit_app_instructions, then we see that it returns the policy for itself, but that it changes the policy of call instructions (https://github.com/Granary/granary/blob/release/clients/watchpoints/clients/leak_detector/instrument.cc#L223) to instrument the targets of those calls using the leak_policy_continue policy. Similarly, if it finds a ret instruction, then it will inject a function call that will lead to leak_notify_exit_module (https://github.com/Granary/granary/blob/release/clients/watchpoints/clients/leak_detector/thread.cc#L122) being invoked.

So far, a summary of the execution behavior is that if the kernel invokes some module code, then the first thing that happens is leak_notify_enter_module is invoked. Just before the last return from instrumented module code to kernel code happens, we invoke the leak_notify_exit_module function. This results in the thread_state for the module taking on the value MODULE_EXIT.

Okay, so why all of this fuss? Recall that this tool is attempting to detect leaks of all module-allocated memory. What Akshay discovered why working on this system is that things are not so simple as just function-wrapping kmalloc, because that would only cover cases where instrumented module code invokes kmalloc. However, sometimes the kernel allocates memory on behalf of the module, then does some non-trivial initialization of that memory, and later returns that memory to the module. This case is not covered by our simplistic function wrappers as Granary will not have visibility on memory being allocated by native kernel code.

In this case, what was needed was an understanding of whether or not an allocation was executing in the context of module code. In the simplest term, an allocation is in the context of a module in one of two cases:

  1. The (instrumented) module is directly invoking the allocator.
  2. The (instrumented) module is invoking some (uninstrumented) kernel code, which is directly or indirectly invoking an allocator.

Case 1 is easy to handle with a FUNCTION_WRAPPER; however, case 2 cannot be solved with a function wrapper. Therefore, the solution to case 2 is two-fold:

  1. Tracking whether or not some code is executing within the context of a module. This is precisely what is stored in thread_state. If the value of thread_state is MODULE_RUNNING then we are in the context of a module. If the value is MODULE_EXIT, then the thread has previously executed in the context of a module, but is currently not executing module code. Finally, if the value is NONE then the thread has never executed instrumented module code, and so can be seen as a "fully native" thread, at least as far as we can tell.
  2. Interposing on all invocations of kmalloc, regardless of whether or not its instrumented or native code invoking kmalloc. To achieve this, we use "probe-based" instrumentation, which takes the form of PATCH_WRAPPERs in Granary. A PATCH_WRAPPER is like a FUNCTION_WRAPPER, but allows you to fully replace an entire kernel function, while still having the ability to invoke the original. You can see an example here: https://github.com/Granary/granary/blob/release/clients/watchpoints/clients/leak_detector/kernel/linux/allocator_patch.h#L22

What is meant by "internal app code" is that it is module code that is not within the entry function into the module. Suppose we have the following call chain: K1 calls M1 calls M2, where K1 is a kernel function, and M1 and M2 are instrumented module functions.

Then the first basic block of M1 is instrumented with leak_policy_enter, and is treated as non-internal app code. All other blocks of M1 are instrumented with leak_policy_exit. Similarly, these are non-internal. Finally, all of M2 and any other module code executed by M2 is instrumented by leak_policy_continue, and is treated as "internal". The distinction exists solely for the purpose of classifying code such that we can recognize when module code is entered and exited.

It's not clear to me that, as implemented, the method of tracking module entry/exit is sufficient. That is, I think a counter should also track the number of nestings, e.g. K1 -> M1 -> K2 -> M2 -> etc.

An alternative approach that would not rely on PATH_WRAPPERS is the AUTO_INSTRUMENT_HOST config option within the policy class declaration. If set to true, then all kernel (host) code invoked by instrumented module (app) code should also be instrumented. Then kmalloc et al can be function wrapped.

renzhengeek commented 10 years ago

Thanks.clearly explanations.

The following code(https://github.com/Granary/granary/blob/release/clients/watchpoints/clients/leak_detector/thread.cc#L107) I cannot understand:

void leak_notify_thread_enter_module(void) throw() {
        thread_state_handle thread = safe_cpu_access_zone();
        leak_detector_thread_state *thread_state(thread->state);
        if(!thread_state){
...
        }   
    }   

what make me confuse are:

  1. thread is a thread_state_handle (https://github.com/Granary/granary/blob/master/granary/state_handle.h#L63) struct, but safe_cpu_access_zone(https://github.com/Granary/granary/blob/master/granary/state_handle.h#L24) is another blank struct, why can be assiged from safe_cpu_access_zone to thread_state_handle without complier error? 2.Obviously, thread is local varible, thread->state canbe a any value,why do use it to initialize thread_state?

There maybe some tricks I don't know, can you show me? Thanks;)

pgoodman commented 10 years ago

So, thread_state_handle wraps a granary::thread_state, which extends client::thread_state. When you do thread->foo, it gives you a granary::thread_state::foo. The implementation of thread_state_handle is a bit hard to follow, but depending on how big task_struct::granary is and how big granary::thread_state is, it will either inline the thread state into the task_struct, or it will heap-allocate some extra state, and insert a pointer to this state into the kernel's task_struct::granary field.

It looks like the only think Akshay stored in the client::thread_state was a pointer to a client::wp::leak_detector_thread_state, which he manually allocates. To be honest, I have no idea why this extra level of indirection was used, as client::wp::leak_detector_thread_state is no longer than a pointer. A good point for refactoring would be to remove client::wp::leak_detector_thread_state, and move its fields into client::thread_state, and then adjust all uses accordingly.