weekly report: 2017.11.08-2017.11.14

周报

2017.11.08 ~ 2017.11.14

已经一个月不做汇报，因此这周的汇报先对上次汇报的内容做一个回顾和总结，然后给出这周的工作做一个整理 : )。

回顾和总结

上次汇报已经提到，之前的代码已经完成了整个项目的搭建和基本功能代码的编写，但有几个问题仍需解决：

1. SDG

我们只在代码中对每个方法进行了 PDG 的构建，但 SDG 并未进行。

2. unitIteratorOfPDGNode

Set<Unit> runBackwardSlicingFor(Callsite callsite) 依赖于方法 Iterator<Unit> unitIteratorOfPDGNode(PDGNode n)，该方法做了如下假设：

假设 PDGNode 的类型仅有 CFGNODE 和 REGION 两种，因为 Soot 的 javaDOC 中有这么几句：

In essence, the PDG nodes represent (within them) either CFG nodes or Region nodes.

3. canHandleIssue

boolean canHandleIssue(ApiContext model, Unit unit) 目前实现过于简单，上面也提到过，是个难点。

本周工作整理

根据上周的进展和问题，本周做了如下工作：

1. 对上次完成的代码进行初步测试，并在测试过程中基本确定了上面提到的 `unitIteratorOfPDGNode` 问题

上次的工作已经在 github 仓库的 master 分支下，直接利用 CSipSimple 对这份简单实现版本的 FicFinder 进行第一次测试，测试结果很显然。没有报告任何 Issue，对于程序崩溃和乱报错误来说，这还算一个可以接受的结果。但显然，结果是错误的(CSipSimple 里有显而易见的 FIC 问题)，因此对代码打 log 进行了追踪。

在代码追踪的过程中，发现每次 unitIteratorOfPDGNode 都能返回预期的结果，说明上面的假设是成立的。

2. 对测试结果的分析

代码的追踪发现了以下几个问题：

callsite 寻找不充分。追踪发现，在寻找 callsite 时几乎所有 model 都没有任何 callsite 找到，主要问题出在 callgraph.edgesInto 这个方法上，由于 Call Graph 强依赖于程序入口，因此可能是 flowdroid 生成的 dummyMain 有问题，但得出这个结论的前提是其他代码没有问题。
slicing 生成不充分。目前 slicing 的生成仅仅依赖 PDG，使用的核心方法是 PDGNode.getBackDependets。因为目前还未涉及到 SDG，所以导致这个问题的原因主要是：a. 对核心方法的使用不当以及核心方法返回的不充分性。 b. SDG 尚未构建，导致 slicing 偏小。

3. 根据测试进行的改进

发现上述问题后，咨询了下 Lili，帮了大忙。对上述问题目前有了以下的讨论和解决：

首先对问题进行化简，先不讨论安卓应用，仅讨论一份简单的 java 代码，因此在上述仓库下创建了 java 分支，先来保证 FicFinder 可以对普通的 java 进行类似的分析。这次测试的代码使用了项目 test 目录下的 DozeChecker.java，我们在里面利用类 os 及其内部类 Build 模拟了一个 FIC 问题。

a. callsite 寻找不充分

咨询 Lili 后了解到，Lili 寻找对方法 M 的 callsite 的方法是通过遍历所有的语句来寻找，并未使用 callsite.edgeInto 这个方法。因此这个问题的原因究竟是不是 dummyMain 的问题现在还不能确定，但由于这次我们先讨论一份简单的 java 代码，不存在 dummyMain 的问题，因此我们暂时先使用 callsite.edgesInto，如果最终结果符合预期，我们便可以断定 dummyMain 有问题，否则将不是 dummyMain 的问题。

b. slicing 生成不充分

上面提到，这个问题的出现有两个原因。

首先针对第一个原因，利用 DozeChecker 对 FicFinder 进行追踪发现几乎每个 PDGNode 的 getBackDependets 大小都是 0，而 pdg.getDependents 却不是，因为暂时还并未查到 Soot 中所指的 backDependents 和 dependents 的区别，因此本着扩大 slicing 的想法暂时使用了后者。

然后针对第二个问题，我们说之所以需要 SDG 的部分信息是因为大多数开发者会通过 DozeChecker 中使用的方式（即提取一个公共方法出来，而并不是在每个需要监测的地方都重写判断逻辑）来进行兼容性解决，因此 PDG 或者说 intra-procedure analysis 并不能解决这种问题。目前的解决方案是，寻找 callsite 前面距离 callsite 最近(之后或许应该改成 k 邻近)的 IfStmt，对该 IfStmt 中使用到的变量(use value)进行反向追踪，寻找其定义(define value)的 AssignStmt 语句，并将其加入 slicing 中，如果这条 AssignStmt 进行了方法调用，则该方法的所有(之后应该进行更精确的寻找)语句都加入 slicing 中。

通过上述分析，目前修改 runBackwardSlicingFor 如下：

/**
 * Backward slicing algorithm.
 *
 * we find the backward slicing of a unit by:
 *  1. get the corresponding pdg, which describes the unit's method
 *  2. find the dependents of callerUnit
 *  3. find the nearest IfStmt of the caller unit
 *  4. find the dependents of ifUnit(which defines the variable used in ifUnit)
 *
 */
private Set<Unit> runBackwardSlicingFor(Callsite callsite) {
  Set<Unit> slice = new HashSet<>(128);
  Map<String, ProgramDependenceGraph> pdgMapping = Env.v().getPdgMapping();

  // 1. get the corresponding pdg, which describes the unit's method
  SootMethod caller = callsite.getMethod();
  ProgramDependenceGraph pdg = pdgMapping.get(caller.getSignature());
  Unit callerUnit = callsite.getUnit();

  // 2. find the dependents of callerUnit
  slice.addAll(findDependentOf(pdg, callerUnit));

  // 3. find the nearest IfStmt of the caller unit
  Unit ifUnit = null; int ifIdx = -1;
  List<Unit> unitsOfCaller = new ArrayList<>(caller.getActiveBody().getUnits());

  for (int i = 0; i < unitsOfCaller.size(); i ++) {
    Unit u = unitsOfCaller.get(i);
    if (!u.equals(callerUnit)) { if (u instanceof IfStmt) { ifUnit = u; ifIdx = i; } }
    else { break; }
  }

  // 4.  find the dependents of ifUnit(which defines the variable used in ifUnit)
  if (ifUnit != null) {
    IfStmt ifStmt = (IfStmt) ifUnit;
    // get use boxed, e.g. [$v1, $v2] of `if $v1 > $v2 goto label 0`
    Set<Value> useValuesOfIfUnit = new HashSet<>(2);
    for (ValueBox vb : ifStmt.getCondition().getUseBoxes()) {
      Value v = vb.getValue();
      // we only save Locals
      if (v instanceof Local) {
        useValuesOfIfUnit.add(vb.getValue());
      }
    }
    for (int i = ifIdx - 1; i >= 0; i --) {
      Unit u = unitsOfCaller.get(i);
      // $vx = ...
      if (u instanceof AssignStmt) {
        AssignStmt assignStmt = (AssignStmt) u;
        List<ValueBox> defBoxesOfAssignUnit = assignStmt.getDefBoxes();
        // check whether $vx in useBoxesOfIfUnit, if yes, then add it to slice
        for (ValueBox vb : defBoxesOfAssignUnit) {
          if (useValuesOfIfUnit.contains(vb.getValue())) {
            slice.add(assignStmt);
            // $vx = virtualinvoke method, then we need to add all stmt in method
            if (assignStmt.containsInvokeExpr()) {
              SootMethod method = assignStmt.getInvokeExpr().getMethod();
              slice.addAll(assignStmt.getInvokeExpr().getMethod().getActiveBody().getUnits());
            }
          }
        }
      }
    }
  }

  return slice;
}

4. 新的测试结果

因为目前我们简化了问题，目前的版本对 DozeChecker 的测试已经能够成功报告我们模拟的问题，但有一处误报，误报的原因会在下面提到：

API <com.example.DozeChecker: void go()> should be used within the context: 
     android API level: [5, 26]
     android system version: [1.7976931348623157E308, ~]

API <com.example.DozeChecker: void go()> should be used within the context: 
     android API level: [5, 26]
     android system version: [1.7976931348623157E308, ~]

4. 发现的新问题

Soot 的同步问题：Call Graph 的构建是并行的，追踪时有几次中间运行结果不一致。解决方案，放到回调里。
Soot 在将 java 代码转换成 Jimple 的时候可能会做优化，在 DozeChecker 的检测里，对方法 isCompatible 转换后的 Jimple 代码如下所示：（可以看到并没有对 os.Build 的引用，这也就是上面提到的会误报的原因）

// java source code
public boolean isCompatible() {
  return os.Bulid.SDK_VERSION > 26;
}

// converted jimple target code
public boolean isCompatible() {
  com.example.DozeChecker this;
  this := @this: com.example.DozeChecker;
  goto label1;
label1:
  goto label2;
label2:
  return 0;
}

5. 接下来的任务

解决上面提到的两个问题，尤其需要考虑第二个问题的解决方案。
对 slicing 过程进行进一步调优，如上面提到的 k 邻近，以及更精确到查找。
对 canHandleIssue 进行更精确到判断，需要涉及对判断语义的理解，从而降低误报率。
上面三个问题解决后转到安卓。
按以前计划进行下一步。

心声

相比前几周仅仅是写代码，这周看到的结果令人欣慰，虽然目前的结果精确性不高，误报率很大，但至少已经有了一个好的开头，接下来的调试和优化将是一个令人兴奋的过程。

connglli / ELEGANT