facebook / infer

A static analyzer for Java, C, C++, and Objective-C
http://fbinfer.com/
MIT License
14.97k stars 2.02k forks source link

[Java] Taint tracking and virtual calls #1440

Open yuske opened 3 years ago

yuske commented 3 years ago

Hi Infer devs!

I tried to use Infer for implementation custom taint analysis and faced with some issues of virtual call handling. Could you clarify the current status of the taint tracking features and give some advice on how I can improve them? I wrote the followed example and run Topl, Quandary and Biabduction analyses and marked source() and sink(String) functions as source and sink respectively:

interface ITaintTest {
    String GetObj();
    void CallSink(String obj);
}

class UnsafeTest implements ITaintTest {
    @Override
    public String GetObj() {
        return null;
    }

    @Override
    public void CallSink(String obj) {
        Main.sink(obj);
    }
}

class SafeTest implements ITaintTest {
    @Override
    public String GetObj() {
        return "Test";
    }

    @Override
    public void CallSink(String obj) { }
}

public class Main {
    public static void main(String[] args) {
        var x = source();
        Test(new SafeTest(), x);    // [!] FALSE POSITIVE
        Test(new UnsafeTest(), x);  // TRUE POSITIVE
    }

    private static void Test(ITaintTest obj, String t) {
        obj.CallSink(t);
        String s = obj.GetObj();
        s.length();
    }

    private static String source() { 
        return "TAINTED";
    }

    public static void sink(String obj) { }
}
  1. Topl: infer -g --topl-only --topl-properties taint.topl -- javac Main.java The analysis doesn’t detect the sink call via a virtual call. I get the message in debug logs:

    No spec found for void ITaintTest.CallSink(String)
    skipping unknown procedure

    It seems that the virtual call dispatching has not implemented yet for Pulse. Have you any plans to implement it soon? Or do you have a description of the algorithm/approach that should be used here?

  2. Quandary: infer -g --quandary-only -- javac Main.java No issues found. The summary of Main.Test is empty Quandary: { }, but the UnsafeTest.CallSink summary contains the sink call:

    Quandary: { @val$1 -> (Footprint({ @val$1* }) ~> { Other(void Main.sink(String) at line 19) }, *) }

    It looks like the Quandary skips applying summaries for implementation(s) of virtual methods. Could it be changed by settings or any way?

  3. Biabduction: infer -g --biabduction-only -- javac Main.java This analysis detects Null Dereference correctly (only for the call Test(new UnsafeTest(), x)). I see that the Infer resolves virtual call by corresponding call site type information. Could the taint analysis be implemented based on the Biabduction? Or maybe possible re-using virtual call resolving algorithm for another kind of taint analysis?

Also, I found the note implement resolve_virtual to not skip virtual calls in TODO.org file. As I understand, it talks about the method resolve_virtual_pname in the Biabduction and it means that the current approach has some limitations. Can you give some info about such limitations if they exist?

jvillard commented 3 years ago

Hi @yuske, as you have observed only biabduction has a way to resolve dynamic dispatch at the moment. We plan to add this capability to Pulse as well but haven't started yet. The goal is to have something similar to what biabduction currently does: when a call to a method goo(AbstractClass a) is made which an argument c whose dynamic type is resolved to be of type Concrete, we want to generate a copy of goo that is specialised to Concrete: goo_Concrete(Concrete c) and pretend that we are calling that instead. This makes dynamic dispatch work nicely across chains of method calls, so that an eventual interface call a.virtual() can be resolved as a.Concrete.virtual(). Neither of these two things happens in Pulse at the moment. Pulse does track dynamic type information for every value though, so this is the natural next step. I can say more if you are interested.

The TODO.org file you mention belongs to the SLEdge project, which is distinct from infer (though we do share this repository).

rgrig commented 3 years ago

Thanks for looking at this.

There was an old Topl implementation that worked on both Biabduction and Pulse, based on instrumenting the intermediate representation. It was removed because it was much slower than the current Pulse-specific implementation. A Biabduction-specific version of Topl seems difficult to do. And Pulse needs to know about virtual calls anyway.