关于get_code_configs的优化问题

大佬好，我发现下面的代码好像有可以优化的地方 1、affect_funcs只取"("前面的部分，如果函数在文件开头有个声明，就会导致位置错误，例如下面这个CONFIG_PROC_FS 2、lines[i].strip() in ["}", "};"]:，函数内部也会有"}"，导致后面的部分被忽略 3、函数内部只要有"#ifdef"就识别，但是补丁修改的内容可能跟这个配置没关系

static int newseg(struct ipc_namespace *, struct ipc_params *);
static void shm_open(struct vm_area_struct *vma);
static void shm_close(struct vm_area_struct *vma);
static void shm_destroy(struct ipc_namespace *ns, struct shmid_kernel *shp);
#ifdef CONFIG_PROC_FS
static int sysvipc_shm_proc_show(struct seq_file *s, void *it);
#endif

def find_affected_funcs_in_patch(patch):
    """
    Find all the functions affected by a patch.
    """
    func_changes = re.findall(r"@@ -\d+,\d+ \+\d+,\d+ @@ (.*?)\n", patch)
    affect_funcs = set()
    for change in func_changes:
        affect_funcs.add(change.split("(")[0])
    return affect_funcs

def get_func_configs(kernel_dir, file_rel_path, func_probe):
    """
    Get kernel configs before and within a single function.
    """
    res = set()
    try:
        with open(f"{kernel_dir}/{file_rel_path}", "r") as f:
            lines = f.readlines()
    except FileNotFoundError:
        return res
    # find the line number of the function tail '}'
    func_tail_line = None
    probe_line = None
    for i in range(len(lines)):
        if lines[i].startswith(func_probe):
            probe_line = i
        if probe_line != None and lines[i].strip() in ["}", "};"]:
            func_tail_line = i
            break
    if probe_line == None or func_tail_line == None:
        return res
    # search for CONFIG_XXX before func
    ifdef_stack = list()
    for i in range(probe_line):
        line = lines[i].strip()
        # deal with ifdef
        if line.startswith("#ifdef"):
            ifdef_stack.append(line.strip().split()[1])
            continue
        if line.startswith("#endif"):
            if ifdef_stack:
                ifdef_stack.pop()
            continue
    if ifdef_stack:
        res.update(set(ifdef_stack))
    # search for CONFIG_XXX within func
    for i in range(probe_line, func_tail_line + 1):
        line = lines[i].strip()
        if line.startswith("#ifdef"):
            res.add(line.strip().split()[1])
            continue
    return res

谢谢你提的这些建议。

关于1，确实存在这样的情况，也可能有别的情况，比如在头文件内；

关于2，确实，比较好的解法是用一个栈去从函数的第一个“{”匹配到最后一个“}”；

关于3，我想了一下，这个情况可能是比较复杂的，即使补丁修改的位置不在配置生效的范围内（#ifdef #endif 补丁 / 补丁#ifdef #endif），如果配置影响了函数的控制流，配置的生效与否仍然有可能会影响补丁的有效性。考虑这样一种情况：CONFIG enabled和CONFIG disabled会产生两个版本的目标函数，只有CONFIG enabled的函数会导致漏洞，但是补丁是一个优化性质的补丁，对于CONFIG enabled的情况，会修复漏洞；对于CONFIG disabled的情况，无作用或者会提升性能。在这种情况下，开发者可能不会在打补丁的时候加上CONFIG的约束。

事实上，当前版本KernJC的很多实现都是基于over-approximate的思想——CONFIG可以多于所需的。因为之前的实验经验发现漏洞周围的CONFIG对于漏洞的有效性起消极影响（比如互斥配置、导致无法boot的配置等）的情况比较罕见，并且多若干CONFIG对于环境构建的时间影响相比于人配置所需的时间来说可以忽略不计。

3引出的另一个问题是，如何判断补丁中的代码是不是全部都和漏洞本身相关，这个也是一个open problem。

我来研究一下怎么基于你的建议改进～

NUS-Curiosity / KernJC

关于get_code_configs的优化问题 #4