TNG / ArchUnit

A Java architecture test library, to specify and assert architecture rules in plain Java
http://archunit.org
Apache License 2.0
3.18k stars 288 forks source link

getCallsOfSelf does not return calls made through interface reference #1225

Open jow290764 opened 8 months ago

jow290764 commented 8 months ago

Consider the following code base:

// 1. Interface
public interface MyInterface {
    void doSomething();
}
// 2. Two implementations of the interface
public class MyClassA implements MyInterface {

    @Override
    public void doSomething() {
        // arbitrary code
    }
}
public class MyClassB implements MyInterface {

    @Override
    public void doSomething() {
        // arbitrary code
    }
}
// Finally some other code calling doSomething through the interface
public class MyClassCaller {
    protected void callDoSomething() {

        MyInterface myInterface = new MyClassA();
        myInterface.doSomething();
    }

Then calling getCallsOfSelf on JavaMethod object for MyClassA.doSomething will find nothing. But expected would be to find MyClassCaller.callDoSomething.

Changing type of myInterface in callDoSomething from MyInterface to MyClassA leads to getCallsOfSelf finding callDoSomething

I was using the following dependency to archunit:

        <dependency>
            <groupId>com.tngtech.archunit</groupId>
            <artifactId>archunit-junit5-engine</artifactId>
            <version>1.2.1</version>
        </dependency>
hankem commented 8 months ago

This is actually expected behavior: ArchUnit does not (and in general cannot) track the type of all objects at runtime, and

    void callDoSomething() {
        MyInterface myInterface = new MyClassA();
        myInterface.doSomething();
    }

is compiled to the following byte code:

  void callDoSomething();
    Code:
       0: new           #7                  // class MyClassA
       3: dup
       4: invokespecial #9                  // Method MyClassA."<init>":()V
       7: astore_1
       8: aload_1
       9: invokeinterface #10,  1           // InterfaceMethod MyInterface.doSomething:()V
      14: return

– which doesn't know anything about a call to MyClassA.doSomething, which will only be resolved via runtime polymorphism.

jow290764 commented 8 months ago

That is unfortunate because then it is not reliably possible to determine whether a specific method is directly or transitively invoked by certain other methods.

I am not familiar with Java bytecode, and I accept your answer, but I have heard that there are analysis tools such as ASM, Byte Buddy, or BCEL that can extract information about the implementation of interfaces from bytecode. So, theoretically, it seems possible. Perhaps there could be discussions about whether it might be made possible in the future.

hankem commented 8 months ago

The information about implementation of interfaces is [available in ArchUnit](https://javadoc.io/doc/com.tngtech.archunit/archunit/latest/com/tngtech/archunit/core/domain/JavaClass.html#getAllSubclasses()), but your question is about resolving runtime/dynamic polymorphism.

While it may seem obvious that

        MyInterface myInterface = new MyClassA();
        myInterface.doSomething();

calls MyClassA's doSomething() method, would you also expect that

    void callDoSomething() {
        doSomething(new MyClassA());
    }

    void callDoSomething(MyInterface myInterface) {
        myInterface.doSomething();
    }

is recognized? You can see that this can become arbitrarily complicated for a static code analysis tool.

To me, this seems impossible (in general), but I'm of course open to suggestions. We can also discuss whether a limited scope (e.g. recognize your scenario, but not mine) might be reasonable.

jow290764 commented 8 months ago

I concur that precisely tracking the diverse paths taken by a program's control flow to conclusively assert that only objects of type T1 - Tn, implementing an interface MyInterface, can emerge at a particular juncture—excluding objects of type Tn+1 - Tn+m, which also implement MyInterface—would be a formidable task.

However, should one implement a mode positing that, when leveraging interfaces, any implementation of the interface could potentially manifest wherever the interface is employed, then it would become comparatively straightforward to compute all theoretically feasible transitive chains.

Or, could there be an aspect I've inadvertently overlooked?