SunghoLee / HybriDroid

Static Analysis Framework for Android Hybrid Applications
24 stars 12 forks source link

Bug Report #1

Closed 131220065 closed 5 years ago

131220065 commented 7 years ago

Hello, Mr Lee, I'm a Chinese student in Nanjing University and I'm doing my graduation project with HybriDroid. Then, I found some bugs in HybriDroid, they're below:

1. com.ibm.wala.cast.js/source/com/ibm/wala/cast/js/html/DomLessSourceExtractor.java

After you modified com.ibm.wala.cast.js/source/com/ibm/wala/cast/js/html/DomLessSourceExtractor.java, I found that the js file linked to the html file never be added to the analysis scope. So I modified it back, then it returned to normal. And below is my code:

private void getScriptFromUrl(String urlAsString, ITag scriptTag) throws IOException, MalformedURLException {
      URL scriptSrc = new URL(entrypointUrl, urlAsString);
      Reader scriptInputStream;
      try {
        BOMInputStream bs = new BOMInputStream(scriptSrc.openConnection().getInputStream(), false,
            ByteOrderMark.UTF_8, //note
            ByteOrderMark.UTF_16LE, ByteOrderMark.UTF_16BE,
            ByteOrderMark.UTF_32LE, ByteOrderMark.UTF_32BE);
        if (bs.hasBOM()) {
          System.err.println("removing BOM " + bs.getBOM());
          scriptInputStream = new InputStreamReader(bs, bs.getBOMCharsetName()); //note
        } else {
          scriptInputStream = new InputStreamReader(bs); //note
        }

//        scriptInputStream = new InputStreamReader(scriptSrc.openConnection().getInputStream());
      } catch (Exception e) {
        //it looks like this happens when we can't resolve the url?
        if (DEBUG) {
          System.err.println("Error reading script: " + scriptSrc);
          System.err.println(e);
          e.printStackTrace(System.err);
        }
        return;
      }

      BufferedReader scriptReader = null;
      try {
        String line;
        scriptReader = new BufferedReader(scriptInputStream);
        StringBuffer x = new StringBuffer();
        while ((line = scriptReader.readLine()) != null) {
          x.append(line).append("\n");
        }
        scriptRegion.println(x.toString(), scriptTag.getElementPosition(), scriptSrc, false);
      } finally {
        if (scriptReader != null) {
          scriptReader.close();
        }
      }
    }

2. Having no regard for sub classes of WebView for getting loadUrl method's parameter in String Analysis

In AndroidStringAnalysis.java, the analyse method, the hotspots is only the WebView.loadUrl hotspot, Then in findHotspots method, calling the isHotspot method only judged the instruction's class and method descriptor equals WebView and loadUrl, having no regard for the sub classes of WebView. My method to solve the problem is getting the sub classes of WebView and add to the hotspots:

                        cha = ClassHierarchy.make(scope);
            Collection<IClass> webViewSubClasses = cha
                    .computeSubClasses(TypeReference.find(ClassLoaderReference.Primordial, "Landroid/webkit/WebView"));
            for (IClass webViewSubClass : webViewSubClasses) {// sub classes of WebView(contains WebView self)
                TypeName tn = webViewSubClass.getName();
                String cDescriptor = tn.toString().substring(1);
                if (!cDescriptor.contains("/")) {// add, some times there is no '/' in cDescriptor
                    cDescriptor += '/';
                }
                hotspots.add(new ArgumentHotspot(ClassLoaderReference.Application, cDescriptor,
                        "loadUrl(Ljava/lang/String;)V", 0));
            }

3. Multi Dex

Some big Android Apps have more than one dex file. Also in AndroidStringAnalysis.java, the method addAnalysisScope only add the apk file to the analysis scope. However, it only can add the first dex file "classes.dex" to the analysis scope. What I do is unzip the all dex files to a folder, and add the all dex files to the analysis scope:

    @Override
    public void addAnalysisScope(String path) {
        if (path.endsWith(".apk")) {
            List<String> dexFilePaths = AndroidStringAnalysisPatch.unzipDexFiles(path, ara.getDir());

            try {
                for (String dexFilePath : dexFilePaths) {
                    scope.addToScope(ClassLoaderReference.Application, DexFileModule.make(new File(dexFilePath)));
                }
            } catch (IllegalArgumentException e) {
                // TODO Auto-generated catch block
                e.printStackTrace();
            } catch (IOException e) {
                // TODO Auto-generated catch block
                e.printStackTrace();
            }
        } else {
            throw new InternalError("Support only apk format as target file");
        }
    }

Also, in AndroidResourceAnalysis.extractResources, the smali dir maybe multi, the apktool can extract multi dex to multi smali dirs. The extractResources method only considered one smali dir.

4. Bridge Object's Dealing

public class MyJSBridge {

      private Context pageContext = null;
      private PrivateDataGetter privateDataGetter;

      public MyJSBridge(Context pageContextPara) {
            this.pageContext = pageContextPara;
            privateDataGetter = new PrivateDataGetter(pageContext);
      }

      @JavascriptInterface
      public String getContactsInfo() {
            List<Pair<String, String>> phoneContacts = privateDataGetter.getPhoneContacts();
            List<Pair<String, String>> simContacts = privateDataGetter.getSIMContacts();
            String message = "";
            message += "Phone Contacts:\n";
            for(Pair<String, String> oneContact : phoneContacts) {
                  message += oneContact.first + " : " + oneContact.second + "\n";
            }
            message += "SIM Card Contacts:\n";
            for(Pair<String, String> oneContact : simContacts) {
                  message += oneContact.first + " : " + oneContact.second + "\n";
            }
            return message;
      }
}

this.webView.addJavascriptInterface(new MyJSBridge(WebViewActivity.this), "bridge");

Above is my hybrid android example, there is a field privateDataGetter in my bridge class MyJSBridge, and it is initialized in MyJSBridge's constructor method, and my bridge method getContactsInfo() calls the field privateDataGetter's method getPhoneContacts and getSIMContacts. Then, addJavascriptInterface to insert the object to JS environment as "bridge". The bridge method is called in JS environment. However, the CallGraph that HybriDroid generates doesn't contains the two nodes getPhoneContacts and getSIMContacts. Then I found that in AndroidHybridCallGraphBuilder.java, the HybridJavaConstraintVisitor.visitInvoke method deals with the bridge classes. However, it creates a new InstanceKey for the bridge classes, having no regard for the PointerAnalysis result about the bridge classes, so maybe the WALA don't know where is the field privateDataGetter initialized. These are what I do to solve the bug:

                                                // hzq add start
                        InstanceKey bridgeIkTemp = null;
                        IClass bridgeClass = null;
                        PointerKey pk = getPointerAnalysis().getHeapModel().getPointerKeyForLocal(caller, objUse);// hzq:
                                                                                                                    // test
                        OrdinalSet<InstanceKey> os = getPointerAnalysis().getPointsToSet(pk);
                        if (os.size() == 1) {
                            bridgeIkTemp = os.iterator().next();
                            bridgeClass = bridgeIkTemp.getConcreteType();
                        } else {
                            Assertions.UNREACHABLE();// test
                        }

                        InterfaceClass wClass = wrappingClass(bridgeClass);
                        cha.addClass(wClass);

                        InstanceKey bridgeIk = bridgeIkTemp; 
                        ((ConcreteTypeKey) bridgeIk).changeTypeTo(wClass);

                        AndroidHybridAppModel.addJSInterface(name, bridgeIk);
                        // hzq add end

I use the PointerAnalysis to get the bridge class's InstanceKey, then change the IClass binded to the InstanceKey (modify the ConcreteTypeKey class to add the changeTypeTo method, though that is ugly, it works). After that, do the next things as the same as the original. Also, I think there is some needless code in the visitInvoke method, such as the BridgeInfo bi's use, and the ConcreteTypeKey objKeys[] array (I think the visitInvoke method can determine one bridge at a time), So I change the method to below:

                 @Override
                 public void visitInvoke(SSAInvokeInstruction instruction) {
            // TODO Auto-generated method stub
            CGNode caller = node;
            CallSiteReference site = instruction.getCallSite();
            PropagationSystem system = builder.getPropagationSystem();
            IClass receiver = cha.lookupClass(instruction.getDeclaredTarget().getDeclaringClass());
            Selector targetMethod = instruction.getDeclaredTarget().getSelector();

            // checking invoked method is 'addJavascriptInterface' of WebView
            // class

            if (DEBUG) {
                if (addjsSelector.equals(targetMethod)) {
                    System.out.println("----");
                    System.out.println("#AddJavascriptInterface");
                    System.out.println("#receiver: " + receiver);
                    System.out.println("#caller: " + caller);
                    System.out.println("#callsite: " + site);
                    System.out.println("----");
                }
            }
            if ((receiver != null && (wvClass.equals(receiver) || cha.isSubclassOf(receiver, wvClass)))
                    && addjsSelector.equals(targetMethod)) {

                if (DEBUG)
                    System.out.println("Invoked addJavascriptInterface.");

                IR ir = caller.getIR();
                // SSAAbstractInvokeInstruction[] invokes = ir.getCalls(site);
                SymbolTable symTab = ir.getSymbolTable();

                // addJavascriptInterface(this, object, name);
                // for (SSAAbstractInvokeInstruction invoke : invokes) {
                SSAInvokeInstruction invoke = instruction;// hzq: testadd
                if (invoke.getNumberOfUses() == 3) {
                    int objUse = invoke.getUse(1);
                    int nameUse = invoke.getUse(2);

                    // to find the object creation site. only support intra,
                    // not inter-method.
                    // only support constant name, not composition name.
                    if (symTab.isConstant(nameUse)) {// addJavaScriptInterface字符串常量参数
                        String name = (String) symTab.getConstantValue(nameUse);

                        /*
                         * Set<BridgeDescription> bdSet =
                         * bi.getDescriptionsOfBridge(node, objUse);
                         * 
                         * if(bdSet.isEmpty()){ String err =
                         * "Cannot find the bridge object instance: \n"; err +=
                         * "# caller: " + node; err += "# instruction: " +
                         * invoke; Assertions.UNREACHABLE(err); }
                         */

                        // hzq add start
                        InstanceKey bridgeIkTemp = null;
                        IClass bridgeClass = null;
                        PointerKey pk = getPointerAnalysis().getHeapModel().getPointerKeyForLocal(caller, objUse);

                        OrdinalSet<InstanceKey> os = getPointerAnalysis().getPointsToSet(pk);
                        if (os.size() == 1) {
                            bridgeIkTemp = os.iterator().next();
                            bridgeClass = bridgeIkTemp.getConcreteType();
                        } else {
                            Assertions.UNREACHABLE();// test
                        }

                        InterfaceClass wClass = wrappingClass(bridgeClass);
                        boolean isSuc = cha.addClass(wClass);
                        HzqStub.stubPrint("addClass = " + isSuc);

                        InstanceKey bridgeIk = bridgeIkTemp; 
                        ((ConcreteTypeKey) bridgeIk).changeTypeTo(wClass);

                        AndroidHybridAppModel.addJSInterface(name, bridgeIk);
                        // hzq add end

                        /*
                         * make mock-up object for Android Java methods of the
                         * interface object. this process is needed, because a
                         * function is an object in JavaScript, but not in Java.
                         * this mock-up is used in MethodTargetSelector, to find
                         * the target Java method at a JavaScript call site.
                         */
                        TypeReference jsString = TypeReference.find(JavaScriptTypes.jsLoader, "LString");
                        IClass jsStringClass = cha.lookupClass(jsString);

                        AbstractFieldPointerKey typePK = ReflectedFieldPointerKey
                                .mapped(new ConcreteTypeKey(jsStringClass), bridgeIk);

                        Collection<IMethod> methods = wClass.getAllMethods();
                        Set<Pair<String, Integer>> overloadChecker = new HashSet<Pair<String, Integer>>();

                        for (IMethod method : methods) {
                            if (hasJavascriptInterfaceAnnotation(method)) {
                                wClass.addMethodAsField(method);
                                String mname = method.getName().toString();
                                int params = method.getNumberOfParameters();
                                Pair<String, Integer> p = Pair.make(mname, params);
                                if (overloadChecker.contains(p)) {
                                    mismatchW.add("[Error] the method is overloaded by type: " + wClass.getName() + ": "
                                            + mname + "(" + params + ")");
                                }
                                overloadChecker.add(p);

                                IField f = wClass.getField(Atom.findOrCreateAsciiAtom(mname));

                                PointerKey constantPK = builder.getPointerKeyForInstanceField(bridgeIk, f);
                                InstanceKey ik = makeMockupInstanceKey(method);
                                if (DEBUG)
                                    System.out.println("\t\tBridgeMethod: " + constantPK + " -> " + ik);

                                system.findOrCreateIndexForInstanceKey(ik);

                                system.newConstraint(constantPK, ik);
                                system.newConstraint(typePK, ik);
                            }
                        }

                        /*
                         * there could be multiple global objects, because a
                         * webview has a global ojbect repectively. the global
                         * objects are seperated by javascript file name.
                         */
                        Set<HotspotDescriptor> descSet = asa.getAllDescriptors();
                        Set<GlobalObjectKey> globalObjs = new HashSet<GlobalObjectKey>();
                        Set<String> htmls = null;
                        for (HotspotDescriptor desc : descSet) {
                            if (desc.getReceiverAlias().contains(Pointing.make(node, instruction.getUse(0)))) {
                                htmls = desc.getValues();

                                if (htmls.size() == 1 && htmls.iterator().next().equals("about:blank")) { // no-op.
                                    return;
                                } else if (!htmls.isEmpty()) {
                                    boolean isOnlineConnection = false;
                                    for (String html : htmls) {
                                        // if((isOnlineConnection |=
                                        // html.startsWith("http")) == true)
                                        // if(DEBUG)
                                        // System.out.println("Load online page:
                                        // " + html +", we do not deal with
                                        // it.");

                                        Atom js = getJSOfHtml(html);

                                        if (js != null) {
                                            globalObjs.add(getGlobalObject(JavaScriptTypes.jsName, js));
                                            if (DEBUG) {
                                                String objs = "";
                                                // for(InstanceKey ik : objKeys)
                                                objs += bridgeIk + ", ";
                                                objs = objs.substring(0, objs.lastIndexOf(","));

                                                System.out.println("HtmlLoaded: " + html + " [" + objs + "]");
                                            }
                                        }
                                    }

                                    if (isOnlineConnection && globalObjs.isEmpty()) { // no-op;
                                                                                        // for
                                                                                        // online
                                                                                        // connection.
                                        return;
                                    }
                                }
                            } else {

                            }
                        }

                        if (globalObjs.isEmpty()) {
                            if (ALL_HTML_FOR_UNKNOWN) {
                                System.err.println("Loaded html is unknown.\n Every local html files may be loaded.");
                                System.err.println("\t" + getHtmls());
                                globalObjs.addAll(getGlobalObjects(JavaScriptTypes.jsName));

                            } else {
                                Assertions.UNREACHABLE("Unknown html page: " + htmls);
                            }
                        }

                        /*
                         * attach the Android Java interface object to
                         * JavaScript global object. the attached object is used
                         * when finding the receiver at a JavaScript call site.
                         */
                        // InstanceKey globalObj =
                        // ((AstSSAPropagationCallGraphBuilder) builder)
                        // .getGlobalObject(JavaScriptTypes.jsName);
                        // system.findOrCreateIndexForInstanceKey(globalObj);

                        // Collection<GlobalObjectKey> globalObjs =
                        // getGlobalObjects(JavaScriptTypes.jsName);

                        FieldReference field = FieldReference.findOrCreate(JavaScriptTypes.jsLoader, "LRoot",
                                "global " + name, "LRoot");

                        String nonGlobalFieldName = field.getName().toString().substring(7);
                        field = FieldReference.findOrCreate(JavaScriptTypes.Root,
                                Atom.findOrCreateUnicodeAtom(nonGlobalFieldName), JavaScriptTypes.Root);

                        IField f = getClassHierarchy().resolveField(field);

                        // PointerKey fieldPtr = builder
                        // .getPointerKeyForInstanceField(globalObj, f);
                        //
                        // system.newConstraint(fieldPtr, objKey);

                        for (InstanceKey globalObj : globalObjs) {
                            PointerKey fieldPtr = builder.getPointerKeyForInstanceField(globalObj, f);
                            System.out.println("hzq: fieldPtr = " + fieldPtr);
                            system.newConstraint(fieldPtr, bridgeIk);
                            // for(int i=0; i<objKeys.length; i++)
                            // system.newConstraint(fieldPtr, objKeys[i]);
                        }

                    } else {
                        // TODO: Support non-constant string value
                        System.err.println("-------");
                        System.err.println("# caller: " + node);
                        System.err.println("# method: addJavascriptInterface");
                        System.err.println("# name paramter: " + nameUse);
                        System.err.println("\tis constant? " + symTab.isConstant(nameUse));
                        System.err.println("-------");
                        if (!NO_STOP)
                            Assertions.UNREACHABLE(
                                    "now, only support constant value of 'addJavascriptInterface' name parameter.");
                    }
                }
            }
                        super.visitInvoke(instruction);
        }

Then, the CallGraph HybriDroid generated contains the two nodes getPhoneContacts and getSIMContacts.

SunghoLee commented 7 years ago

Thank you for your bug report. We will confirm this bug in several weeks.

SunghoLee commented 7 years ago

Sorry for late response.

We confirmed 1, 3, 4 bugs in HybriDroid, but we already treat the sub class problem in the "isHotspot" method, which you described in the second bug case. And, in the fourth bug case, HybriDroid can treat multiple objects as a bridge object when one variable points to the multiple objects.

We will fix the bugs as soon as possible, referring to your fix.

Again, thank you for your kind bug report, and please feel free to contact us when you have any questions or problems.

Thank you!

131220065 commented 7 years ago

That's all right. And thank you for your response, which is a affirmation of my work.

In the fourth bug case, I didn't consider the case of multiple bridge objects indeed.

And in the third bug case, I find that Not only the AndroidStringAnalysis.java need add the MultiDex support, but also the AndroidHybridAnalysisScope.java. Because HybriDroid make twice CallGraph.

And the two CallGraphBuilder can use the same AnalysisCache, which may save the time and space.

Actually, I run a test for the newest Cordova's example App, and it seems that WALA can't deal with Cordova's JS files, and after 12 hours, the result doesn't come out.

And the Taint Analyzer of HybriDroid seems easy to be out of memory and need too much time, is that true? My eclispe's run configuration arguments setting is "-XX:-UseGCOverheadLimit -Xmx4096m -Xms4096m -Xmn2g", and I ran some tests for real-world Apps.

SunghoLee commented 7 years ago

Yes, HybriDroid constructs call graph twice for string and main analysis. We do not reuse AnalysisCache to detach the string analysis module from others, but you can reuse it for performance benefit.

HybriDroid cannot analyze Cordova apps precisely, even though they are implemented in hybrid apps. The main reason is that Cordova uses String based commands to access device resources from JavaScript code. The Cordova structure makes HybriDroid to take long time for analysis and make much false positives via spurious control flows. We think that additional models or approaches are needed to handle that.

Did you use TaintAnalysisForHybrid module for taint analysis? We recommend PrivateLeakageDetector instead of it. TaintAnalysisForHybrid is not tested and we will revise or eliminate the module when it does not have differences from PrivateLeakageDetector.

131220065 commented 7 years ago

I used PrivateLeakageDetector, and did a test for this apk. And below is the code:

        ModeledCallGraphForTaint mcg = new ModeledCallGraphForTaint(p.fst);

        PrivateLeakageDetector pld = new PrivateLeakageDetector(mcg, p.snd);
        pld.analyze();

        System.out.println("#taint anlysis time: " + (System.currentTimeMillis() - endTime) / 1000 + "s");

        for(LeakWarning w : pld.getWarnings()){
            System.out.println("=========");
            System.out.println(w);
            System.out.println("=========");
            w.printPathFlow("leak.dot");
        }
SunghoLee commented 7 years ago

Usually, ModeledCallGraphForTaint is not necessary to track taint data, because the module adds additional edges via Inter-component implicit flows. The module finds the flows using CHA style analysis and spends much memory, so you should use an original call graph instead of ModeledCallGraphForTaint.

Also, the apk does not have any local html files. In that case, HybriDroid does not analyze it fully, when the StringAnalysis module can not find any target html addresses. You can use another string analysis module, like JSA, to analyze target html addresses more precisely.

131220065 commented 7 years ago

OK, after using the original call graph, the program can finish quickly. Thanks a lot. But I don't know what does the "JSA" mean, what's the "JSA"?

SunghoLee commented 7 years ago

JSA is a string analysis tool based-on Soot. We firstly tried to use the module instead of our own module, but its forward analysis is too heavy for us. But in some cases, it may produce more adequate than our module.

131220065 commented 7 years ago

Thank you very much again!

SunghoLee commented 5 years ago

Sorry for the quite late update! Finally, we have fixed the bugs you found out! It is quite late because of my personal works related to this project. Thank you.