MagpieBridge / IRConverter

An IRConverter from WALA IR to Soot IR (Jimple)
Eclipse Public License 2.0
5 stars 2 forks source link

Better Wala error message #1

Open martinschaef opened 4 years ago

martinschaef commented 4 years ago

I'm running some tests from command line. When I create a WalaToSootIRConverter for a source path that only contains simple files (i.e., they don't import non-jdk stuff) everything works fine. But when I add something more complicated and provide the classpath, I always end up with this exception:

[main] ERROR dragonglass.lspintegration.Main - Soot threw an unexpected exception.
java.lang.RuntimeException: java.lang.RuntimeException: com.ibm.wala.ipa.cha.ClassHierarchyException: factory.getLoader failed
        at magpiebridge.converter.WalaToSootIRConverter.iterateWalaJavaSourceClasses(WalaToSootIRConverter.java:183)
        at magpiebridge.converter.WalaToSootIRConverter.convert(WalaToSootIRConverter.java:191)
        at 
...
Caused by: java.lang.RuntimeException: com.ibm.wala.ipa.cha.ClassHierarchyException: factory.getLoader failed
        at magpiebridge.converter.WalaToSootIRConverter.buildClassHierachy(WalaToSootIRConverter.java:174)
        at magpiebridge.converter.WalaToSootIRConverter.iterateWalaJavaSourceClasses(WalaToSootIRConverter.java:181)
        ... 4 more
Caused by: com.ibm.wala.ipa.cha.ClassHierarchyException: factory.getLoader failed
        at com.ibm.wala.ipa.cha.ClassHierarchy.<init>(ClassHierarchy.java:291)
        at com.ibm.wala.ipa.cha.ClassHierarchy.<init>(ClassHierarchy.java:203)
        at com.ibm.wala.ipa.cha.ClassHierarchyFactory.make(ClassHierarchyFactory.java:81)
        at com.ibm.wala.ipa.cha.ClassHierarchyFactory.make(ClassHierarchyFactory.java:67)
        at magpiebridge.converter.WalaToSootIRConverter.buildClassHierachy(WalaToSootIRConverter.java:171)
        ... 5 more
Caused by: java.lang.NullPointerException
        at com.ibm.wala.cast.java.translator.jdt.JDTJava2CAstTranslator.createClassDeclaration(JDTJava2CAstTranslator.java:509)
        at com.ibm.wala.cast.java.translator.jdt.JDTJava2CAstTranslator.visitTypeDecl(JDTJava2CAstTranslator.java:434)
        at com.ibm.wala.cast.java.translator.jdt.JDTJava2CAstTranslator.visit(JDTJava2CAstTranslator.java:3731)
        at com.ibm.wala.cast.java.translator.jdt.JDTJava2CAstTranslator.translateToCAst(JDTJava2CAstTranslator.java:273)
        at com.ibm.wala.cast.java.translator.jdt.ecj.ECJSourceModuleTranslator$ECJAstToIR.acceptAST(ECJSourceModuleTranslator.java:95)
        at org.eclipse.jdt.core.dom.CompilationUnitResolver.resolve(CompilationUnitResolver.java:1068)
        at org.eclipse.jdt.core.dom.CompilationUnitResolver.resolve(CompilationUnitResolver.java:662)
        at org.eclipse.jdt.core.dom.ASTParser.createASTs(ASTParser.java:1013)
        at com.ibm.wala.cast.java.translator.jdt.ecj.ECJSourceModuleTranslator.loadAllSources(ECJSourceModuleTranslator.java:195)
        at com.ibm.wala.cast.java.loader.JavaSourceLoaderImpl.loadAllSources(JavaSourceLoaderImpl.java:599)
        at com.ibm.wala.classLoader.ClassLoaderImpl.init(ClassLoaderImpl.java:506)
        at com.ibm.wala.cast.java.loader.JavaSourceLoaderImpl.init(JavaSourceLoaderImpl.java:608)
        at com.ibm.wala.cast.java.translator.jdt.ecj.ECJClassLoaderFactory.makeNewClassLoader(ECJClassLoaderFactory.java:31)
        at com.ibm.wala.classLoader.ClassLoaderFactoryImpl.getLoader(ClassLoaderFactoryImpl.java:61)
        at com.ibm.wala.ipa.cha.ClassHierarchy.<init>(ClassHierarchy.java:270)
        ... 9 more

Is there a way to get a better error message that would tell me what it was looking for?

linghuiluo commented 4 years ago

@martinschaef Hi Martin, thanks for trying it out. Can you provide me the program which caused the problem? I can try to fix it if I can reproduce the error. The WalaToSootIRConverter.java class is where WALA is used, which contructor did you use to initialize it? for your case the class path to libraries should be passed to libPath in this constructor. public WalaToSootIRConverter(@Nonnull Set<String> sourcePath, @Nonnull Set<String> libPath) I am also ready to chat with you about this issue in Skype. You can find me in Skype with my e-mail address which you have.

martinschaef commented 4 years ago

Thanks for the quick reply :) I can't give you the exact code, but I'll try to produce a smaller example that I can share here. It looks like this is actually a Wala problem (see here).

Btw., is it, at least in theory, possible to tell Wala to create phantom classes if it cannot resolve something? Basically, an equivalent to –allow-phantom-refs in Soot. I have one use case where it would be very expensive for me to get the full classpath and it would be great if I could just tell Wala to use phantoms to make it work.

linghuiluo commented 4 years ago

@martinschaef It will be great if you can produce a smaller example. There is a way tell to wala to create phantom classes: When you look at the following method in WalaToSootIRConverter.java

 /** Use WALA's JAVA source code front-end to build class hierarchy. */
  private void buildClassHierachy() {
    try {
      this.classHierarchy = ClassHierarchyFactory.make(scope, factory);
      Warnings.clear();
    } catch (ClassHierarchyException e) {
      throw new RuntimeException(e);
    }
  }

instead of using the make method, try

ClassHierarchyFactory.makeWithPhantom(scope, factory);

should do the work I believe. I can try to add this support tomorrow. Now it is too late in Germany. Also if you wanna exclude classes from your analysis you could just define them in an exclusion file and pass it to this constructor public WalaToSootIRConverter( @Nonnull Set<String> sourcePath, @Nonnull Set<String> libPath, String exclusionFilePath)

msridhar commented 4 years ago

I would recommend ClassHierarchy.makeWithRoot() over makeWithPhantom(). There are known bugs involving makeWithPhantom(), see https://github.com/wala/WALA/pull/377 and https://github.com/wala/WALA/pull/335. I will update the Javadoc to reflect this.

martinschaef commented 4 years ago

Excellent, thx. I tried to subtype WalaToSootIRConverter, but magpiebridge.converter.ClassConverter#convertClass is protected, so I can't call it from my subclass. I'll play with it tomorrow.

linghuiluo commented 4 years ago

@martinschaef Hi Martin, after playing with an example I constructed, I believe the error message you got means that you didn't give the IRConverter library classes which are directly referenced by the source code. Consider the following source code:

class ApplicationClass
{
 void foo(){
    LibraryClass  c = new LibraryClass();
 }
}

This LibraryClass must be in the library Jar file in the library path you gave to the converter. The WALA source front end uses the front end of Eclipse JDT, when it sees LibraryClass in the source code, it will try to resolve the type LibraryClass from the source and library path. If it can not find it, then it will bring up such error message.

So to fix this issue, you should make sure all library classes which are directly referenced by the source code can be found in the library path.

There is currently no configuration option in WALA to ignore such missing types in the source code front end.

The exclusionFilePath option is not relevant here, since WALA will only ignore these classes when analysis in WALA is used.

After talking with Julian, currently there is no better warning message to tell which class is missed. To figure out which library classes must be included, you can open the source code project in Eclipse, Eclipse will give you warnings about what classes it can not resolve. These classes are then required to be on the library classes.

I just released a new version of the IRConverter which allows you have access to ClassConverter https://github.com/MagpieBridge/IRConverter/packages/96202

martinschaef commented 4 years ago

Excellent, thx. I'll try to work with that. Like I said, I have one use case where it's hard to get any library class because we can't really run the build script to get the compile/runtime classpath.

Is it theoretically possible to extend Java to create a phantom class in these cases, or is there a conceptual problem that makes this hard/impossible? I'm aware that we'd loose soundness because of missing exceptional flow, but that's the same if we use phantoms in bytecode.

msridhar commented 4 years ago

From my understanding the issue is that the Java source front end is now based on the Eclipse Java Compiler (ECJ), and ECJ APIs start doing unexpected things when referenced classes are missing (like returning null unexpectedly). WALA has not been engineered to try to be robust to all of these behaviors. I think it's theoretically possible to be more robust here, but it would be difficult (many of the unexpected behaviors are undocumented). For Java bytecode things are easier since WALA controls all the code for reading bytecode, resolving types and supertypes, etc.

juliandolby commented 4 years ago

Manu is right. The key is that, unlike for bytecode, unknown code can influence the IR that the source front needs to generate.

For instance, imagine seeing a call like foo() in a Java source file, and not having any obvious target. Is it, perhaps, in a unknown super class, or interface with a default implementation? Or perhaps it would be found in a static import of an unknown class? Each of these cases would result in different invoke IR.

This could, potentially, be handled in some cases. The front end could look at the file and see what possibilities there are. Maybe the type hierarchy is complete, which would rule out super calls, and there is exactly one static import; in that case, we could generate a static invoke to foo in whatever class is imported.

But that seems like a rat hole of special cases that anyway would not be able to work in general.

(Note that, in bytecode analysis, the correct bytecode must have been generated by someone already, and they must have been able to compile the code.)

martinschaef commented 4 years ago

Yeah, I was more thinking along the lines of snippet parsing. I remember that ASTParser in JDT has an option to set statement recovery and bindings recovery where it just puts some stuff in the AST to make it well-formed again. It's definitely not in scope for this issue, but I was wondering if the Wala folks ever thought about that. I remember this Darpa Muse project where folks did a lot of snippet parsing.

Did you ever look into snippet parsing?

msridhar commented 4 years ago

There is a long thread on this kind of issue here:

https://github.com/wala/WALA/issues/99

I think someone has tried something like snippet parsing before and it didn't quite work:

https://github.com/wala/WALA/issues/99#issuecomment-222787134

That said, with newer versions of Eclipe JDT / ECJ maybe the snippet parsing would work better.

martinschaef commented 4 years ago

I got a little bit further with a custom WalaToSootIRConverter (see https://gist.github.com/martinschaef/56e070cd7a9c5ecd4a2957d72fd2b92f), but I still get exceptions for anything that's a little more complex. E.g.,

[main] INFO dragonglass.lspintegration.CliCommand - Found 1 classpath entries
java.lang.IndexOutOfBoundsException: Index: 1, Size: 1
        at java.util.ArrayList.rangeCheck(ArrayList.java:657)
        at java.util.ArrayList.get(ArrayList.java:433)
        at magpiebridge.converter.InstructionConverter.convertInvokeInstruction(InstructionConverter.java:752)
        at magpiebridge.converter.InstructionConverter.convertInstruction(InstructionConverter.java:157)
        at magpiebridge.converter.ClassConverter.createBody(ClassConverter.java:261)
        at magpiebridge.converter.ClassConverter.convertMethod(ClassConverter.java:194)
        at magpiebridge.converter.ClassConverter.convertClass(ClassConverter.java:129)
        at dragonglass.lspintegration.analysis.CustomWalaConverter.convert(CustomWalaConverter.java:50)
        at dragonglass.lspintegration.analysis.DragonglassRunner.analyze(DragonglassRunner.java:35)
        at dragonglass.lspintegration.CliCommand.run(CliCommand.java:72)
        at dragonglass.lspintegration.Main.main(Main.java:57)
[main] ERROR dragonglass.lspintegration.analysis.CustomWalaConverter - Index: 1, Size: 1
java.lang.IndexOutOfBoundsException: Index: 1, Size: 1
        at java.util.ArrayList.rangeCheck(ArrayList.java:657)
        at java.util.ArrayList.get(ArrayList.java:433)
        at magpiebridge.converter.InstructionConverter.convertInvokeInstruction(InstructionConverter.java:752)
        at magpiebridge.converter.InstructionConverter.convertInstruction(InstructionConverter.java:157)
        at magpiebridge.converter.ClassConverter.createBody(ClassConverter.java:261)
        at magpiebridge.converter.ClassConverter.convertMethod(ClassConverter.java:194)
        at magpiebridge.converter.ClassConverter.convertClass(ClassConverter.java:129)
        at dragonglass.lspintegration.analysis.CustomWalaConverter.convert(CustomWalaConverter.java:50)
        at dragonglass.lspintegration.analysis.DragonglassRunner.analyze(DragonglassRunner.java:35)
        at dragonglass.lspintegration.CliCommand.run(CliCommand.java:72)
        at dragonglass.lspintegration.Main.main(Main.java:57)

And, if I remove a little bit of code, I get an exception from Soot:

[main] ERROR dragonglass.lspintegration.Main - Soot threw an unexpected exception.
java.lang.RuntimeException
        at soot.jimple.toolkits.callgraph.CallGraph$TargetsOfUnitIterator.<init>(CallGraph.java:213)
        at soot.jimple.toolkits.callgraph.CallGraph.edgesOutOf(CallGraph.java:203)
        at soot.jimple.toolkits.callgraph.ClinitElimAnalysis.flowThrough(ClinitElimAnalysis.java:72)
        at soot.toolkits.scalar.FlowAnalysis.flowThrough(FlowAnalysis.java:592)
        at soot.toolkits.scalar.FlowAnalysis.doAnalysis(FlowAnalysis.java:556)
        at soot.toolkits.scalar.ForwardFlowAnalysis.doAnalysis(ForwardFlowAnalysis.java:46)
        at soot.jimple.toolkits.callgraph.ClinitElimAnalysis.<init>(ClinitElimAnalysis.java:42)
        at soot.jimple.toolkits.callgraph.ClinitElimTransformer.internalTransform(ClinitElimTransformer.java:38)
        at soot.BodyTransformer.transform(BodyTransformer.java:55)
        at soot.BodyTransformer.transform(BodyTransformer.java:59)
        at soot.jimple.toolkits.callgraph.CallGraphPack.internalApply(CallGraphPack.java:65)
        at soot.Pack.apply(Pack.java:117)
        at dragonglass.lspintegration.util.SootSetup.runAnalysis(SootSetup.java:76)
        at dragonglass.lspintegration.analysis.DragonglassRunner.analyze(DragonglassRunner.java:49)
        at dragonglass.lspintegration.CliCommand.run(CliCommand.java:73)
        at dragonglass.lspintegration.Main.main(Main.java:57)

Does either of these look familiar?

linghuiluo commented 4 years ago

@martinschaef The first one exception could be an error in the InstructionConverter caused by an invoke statement. Can you paste the code you removed, especially the invoke statements? The second exception is thrown when a unit in the call edge is null.

martinschaef commented 4 years ago

Unfortunately, I can't really past any of that code. I'll find some time later in the week to work on examples that can be shared.

In the meantime, did you try it on projects that use fancy things, like Lombok, Guava, or AspectJ? Are there any known limitations that these things don't work?

linghuiluo commented 4 years ago

@martinschaef For the exception you pasted I do need code that I can use to debug the issue. I didn't use the IRConverter on incomplete code. I tried it on some web applications using Spring framework. As far as I know, WALA's source code front end supports only up to Java 7 features. So if you have source code written with some Java 8 features, it may crash.

martinschaef commented 4 years ago

Ugh ... that is a big limitation. Our coding guideline says to use Streams everywhere. @msridhar @juliandolby is there a roadmap for java 8+? If not, we might have to change plans a little

msridhar commented 4 years ago

I personally do not have any plans to work on the Java source front end at the moment. I know @mattkindy-praetorian may have poked around at support for lambdas, but not sure if he can share more

linghuiluo commented 4 years ago

@martinschaef To solve this problem, have you thought of using desugaring of Java 8 language features? I know Google does this for Android (D8). Currently, our group is building a new soot and we also uses WALA's source code front end, so supporting Java 8+ features is planned.

msridhar commented 4 years ago

I know Google does this for Android (D8).

D8 desugars bytecodes, not source code. If analysis of bytecodes is in scope, then WALA's bytecode frontend should work fine with Java 8 features.

linghuiluo commented 4 years ago

@msridhar True, but I meant the idea. I can image an automated tool which rewrites the source code before passing it to WALA's source code front end.

juliandolby commented 4 years ago

@martinschaef It seems you would like two things in the WALA Java source front end: support for Java 8 and some ability to parse incomplete code. Incomplete code is hard to do, as I noted in a previous message, but would Java 8 support be useful to you by itself?

On Thu, Apr 16, 2020 at 10:47 AM Linghui Luo notifications@github.com wrote:

@msridhar https://github.com/msridhar True, but I meant the idea. I can image an automated tool which rewrites the source code before passing to WALA's source code front end.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/MagpieBridge/IRConverter/issues/1#issuecomment-614699475, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAMTO7VQIFDNS2ZDUZIFLWLRM4K7HANCNFSM4MEA2B6A .

martinschaef commented 4 years ago

Yes, I think Java 8 is actually the bigger problem than snippets. If we have full Java 8 support, we can still cover a lot of use cases (including using MagpieBridge). Snippet parsing is a nice-to-have, but we can work around it. Should we have a phone call to discuss the options?

mattkindy commented 4 years ago

Unsure about whether I can contribute code, but I was able to get Java 8 support in place by introducing a LambdaEntity(extends JavaProcedureEntity) as part of the translation process (similar for MethodReferenceEntity). These entities handled a lot of the desugaring themslves. For example:

    /**
     * Desugar: create a method on the class with name `lambda${enclosingTypeName}${lambdaIndex}`
     *  Should have same parameters as declared for lambda unless we are encapsulating values from
     *  outside lambda scope, in which case those will be prepended
     */
    public MethodEntity getDesugaredMethod() {
      final MethodDeclaration methodDeclaration = ast.newMethodDeclaration();
      final ASTNode body = fDecl.getBody();

      if (body instanceof Block) {
        methodDeclaration.setBody((Block) ASTNode.copySubtree(ast, body));

      } else if (body instanceof Expression) {
        final Block block = ast.newBlock();

        final ReturnStatement returnStatement = ast.newReturnStatement();
        final Expression clonedBody = (Expression) ASTNode.copySubtree(ast, body);

        returnStatement.setExpression(clonedBody);

        ((List<Statement>) block.statements()).add(returnStatement);

        methodDeclaration.setBody(block);
      }

      if (!isNull(fReturnType)) {
        methodDeclaration.setReturnType2(fReturnType.isNullType()
          ? parseTypeFromString("Object", ast, null) // todo we should do better about figuring out the return type
          : createType(fReturnType, ast, null));
      }

      final List<CAstNode> syntheticOuterLocalsNodes = getSyntheticOuterLocalsNodes();
      final List<SingleVariableDeclaration> capturedVariables = syntheticOuterLocalsNodes
        .stream()
        .map(node -> {
          final SingleVariableDeclaration paramDeclaration = ast.newSingleVariableDeclaration();

          final CAstType cAstType = (CAstType) node.getChild(1).getValue();
          final String paramTypeName = constructTypeName(cAstType);

          final Type type = parseTypeFromString(paramTypeName, ast, null);

          paramDeclaration.setType(type);

          final String name = ((String) node.getChild(0).getValue())
            .replace("val$", "");

          paramDeclaration.setName(ast.newSimpleName(name));

          return paramDeclaration;
        })
        .collect(toList());

      final List<SingleVariableDeclaration> formalLambdaParameters = ((List<VariableDeclaration>) fDecl.parameters())
        .stream()
        .map(param -> {
          if (param instanceof SingleVariableDeclaration) {
            return (SingleVariableDeclaration) ASTNode.copySubtree(ast, param);
          }

          final SingleVariableDeclaration paramDeclaration = ast.newSingleVariableDeclaration();

          final ITypeBinding resolvedType = resolveVariableBinding(param).getType();
          final String paramTypeName = resolvedType.isCapture()
            ? resolvedType.getWildcard().getName()
            : resolvedType.getName();

          final Type type = parseTypeFromString(paramTypeName, ast, null);

          paramDeclaration.setType(type);
          paramDeclaration.setName((SimpleName) ASTNode.copySubtree(ast, param.getName()));

          return paramDeclaration;
        })
        .collect(toList());

      ((List<SingleVariableDeclaration>) methodDeclaration.parameters())
        .addAll(capturedVariables);

      ((List<SingleVariableDeclaration>) methodDeclaration.parameters())
        .addAll(formalLambdaParameters);

      if (!isBound()) {
        ((List<IExtendedModifier>) methodDeclaration.modifiers())
          .add(ast.newModifier(STATIC_KEYWORD));

      } else {
        methodDeclaration.setReceiverType(ast.newSimpleType(ast.newName(getEnclosingClassName())));
      }

      methodDeclaration.setName(ast.newSimpleName(getDesugaredMethodName()));
      final Map<CAstNode, CAstEntity> unravelledEntities = fEntities
        .entrySet()
        .stream()
        .collect(toMap(Map.Entry::getKey, entry -> entry.getValue().stream().collect(onlyElement())));

      final ArrayList<CAstType> parameterTypes = new ArrayList<>();

      parameterTypes
        .addAll(syntheticOuterLocalsNodes
          .stream()
          .map(node -> (CAstType) node.getChild(1).getValue())
          .collect(toImmutableList()));

      parameterTypes.addAll(fParameterTypes);

      return new MethodEntity(
        fAst,
        methodDeclaration,
        getEnclosingTypeBinding(),
        unravelledEntities,
        fNewContext,
        parameterTypes,
        fReturnType,
        annotations);
    }

Then I had to modify some bootstrap functionality to support AST-based lambdas, so that involved also changing LambdaSummaryClass See https://github.com/wala/WALA/issues/549 for some discussion

mattkindy commented 4 years ago

It can perhaps be useful to use something like the below to get a partial desugaring as well? I used this for testing at various points (assuming the code compiles)

http://keweishang.blogspot.com/2016/04/how-compiler-understands-your-java-class.html

juliandolby commented 4 years ago

@mattkindy-praetorian Your code looks very promising. You can certainly contribute code; a pull request complete with a few tests would be great. If you wrote your code yourself, then contributing should be quite easy and would be very much appreciated.