aaschmid / gradle-cpd-plugin

Gradle plugin to find duplicate code using PMDs copy/paste detection (= CPD).
Apache License 2.0
95 stars 12 forks source link

CPD finds duplications in test fixtures despite source limitation #50

Open C-Otto opened 4 years ago

C-Otto commented 4 years ago

I configured CPD for my multi-project Gradle 6.5.1 project. Some of these sub-projects make use of the java-test-fixtures plugin, which allows me to put Java files into src/testFixtures/java. Despite limiting the source as shown below, I get a CPD issue for code inside src/testFixtures/java (both files which are part of the issue are inside the same directory).

With --debug I see the tokenize debug message for the files in source/main/java, followed by those in source/testFixtures/java:

2020-07-10T10:29:58.078+0100 [DEBUG] [de.aaschmid.gradle.plugins.cpd.internal.worker.CpdExecutor] Tokenize /home/cotto/IdeaProjects/xxx/yyy/src/main/java/com/aaa/Logs.java
2020-07-10T10:29:58.078+0100 [DEBUG] [de.aaschmid.gradle.plugins.cpd.internal.worker.CpdExecutor] Tokenize /home/cotto/IdeaProjects/xxx/yyy/src/testFixtures/java/com/bbb/DatabaseLockingTestJob.java

However, when I debug the source property in my configuration, everything is fine (i.e. I only see src/main/java files):

files(source).each { File file ->
    println(file)
}

This happens with:

plugins {
    id 'de.aaschmid.cpd' version '3.1' apply false
}

configure(subprojects.findAll { it.name != 'xxx-platform' }) {
    apply plugin: 'java'
    apply plugin: 'de.aaschmid.cpd'

    check.dependsOn(cpdCheck)

    cpdCheck {
        source = files('src/main/java') // do not run for test code
    }
}
aaschmid commented 4 years ago

Hm, that sounds indeed very strange @C-Otto. Do you have a small reproducer? Is it urgent (just for my prioritization of things)?

C-Otto commented 4 years ago

We won't be able to use CPD once we switched to Gradle, but that won't happen for a few weeks or so. Besides, we can live without CPD, although it's nice to have. So... not that urgent. I'll have a look at a reproducible case tomorrow.

C-Otto commented 4 years ago

https://github.com/C-Otto/cpdtestfixtures

If CPD is configured before the integration tests are introduced, for reasons I don't understand CPD also scans the integration test sources: https://github.com/C-Otto/cpdtestfixtures/commit/fbf644bc9fd9f90e8f2f0e6a1f73557f2cd3b582

If you enable the java-test-fixtures plugin in aaa/build.gradle CPD starts complaining about duplications in the test fixtures code: https://github.com/C-Otto/cpdtestfixtures/commit/52db52e821f88ac83f5059efdfb998fbffe4656d

This might be related to https://github.com/gradle/gradle/issues/13781 and the linked issues.

aaschmid commented 4 years ago

Sounds like a lazy initialization stuff. Thanks for reproducer, I will have a look at it.

aaschmid commented 4 years ago

Very, very strange ... but finally could find the problem: It appears if and only if the cpdChecks source is set before further sourceSets are lazily configured by other applied plugins...

Here is the full story:

  1. Could reproduce it using your reproducer
  2. Wrote an acceptance test, see https://github.com/aaschmid/gradle-cpd-plugin/blob/%2350.text.fixture/src/integTest/groovy/de/aaschmid/gradle/plugins/cpd/test/CpdAcceptanceTest.groovy#L662-L705. Unfortunately, it was working :-( and playing around hadn't helped me understand it
  3. Added some printlns to your reproducer for the subproject:
    println(files(sourceSets.main.allJava).files)
    println(files(sourceSets.test.allJava).files)
    println(files(sourceSets.testFixtures.allJava).files)
    println("----------------")
    println(cpdCheck.source.files)

    They produced the "incorrect" result:

    []
    [/home/schmida/tmp/cpdtestfixtures/aaa/src/test/java/BadTest.java]
    [/home/schmida/tmp/cpdtestfixtures/aaa/src/testFixtures/java/BadTestFixture.java]
    ----------------
    [/home/schmida/tmp/cpdtestfixtures/aaa/src/testFixtures/java/BadTestFixture.java]
  4. I restructured the project by moving the application of java-test-fixture to rootProjects build.gradle => suprisingly, now it was working:
    []
    [/home/schmida/tmp/cpdtestfixtures/aaa/src/test/java/BadTest.java]
    [/home/schmida/tmp/cpdtestfixtures/aaa/src/testFixtures/java/BadTestFixture.java]
    ----------------
    []
  5. Another possibility to fix this is to apply and configure the cpd plugin in the aaas (= sub project) build.gradle => the problem arises if the java-test-fixture plugin is not applied and configured together with the cpd plugin
  6. So this is indeed a problem with the lazy source initialization of a cpdCheck task :-( (see Code)
  7. => The described problem appears if and only if the cpdChecks source is set before further sourceSets configuring plugins are applied because Cpd tasks lazy source initialization still listens to further applied plugins and their sourceSets.

Example: if cpdCheck is configured on in rootProjects build.gradle for subprojects and later another subprojects closure applies java-test-fixture or java plugin, the cpdChecks source is enhanced by their sourceSets.allJava files. subprojects build.gradle files are also applied separately and later...

Good, that I have found the root cause of this issue. Bad, that I currently have only a quite ugly (gut feeling) hack to solve this. Concrete I check if the source was explicitly set and prevent further changes.

@C-Otto does that information help you already such that you can fix the issue by restructuring your build? To be honest, I at least think about documenting this behavior and leave it as is.

Another question to you @C-Otto: Also I prefer to apply the cpd' plugin on therootProject` and configure inclusions / exclusions there. Would this be an alternative or would you like to have duplicates checked only project by project? SonarQube e.g. does check for duplicates also over the whole code base if I remember correctly.

C-Otto commented 4 years ago

It looks like your IT adds a file with duplications to src/main/java? That doesn't look right to me.

Anyway, for me the hotfix/hack using evaluationDependsOnChildren() also helped here (see linked Gradle issue). I'm now able to configure CPD inside something like a subprojects block in the root project, and configure the sources correctly as each subproject's plugins were added before this root configuration happens (the subprojects' configuration tasks run before, so the "new" sourcesets are already known when I configure CPD in the root project's configuration task). Of course I'd appreciate a long-term (non-hacked) solution :) Thanks for the effort!

Running CPD on the whole project would be possible. This doesn't match the existing Maven configuration I'm using as a base, but I think that might be a valid step. However, the root project does not have any source, and I don't know how to configure CPD. If you have any ideas, I'd be happy to experiment further

How to run CPD as part of ./gradlew check? How to add CPD to a subject's check? How to define the sources? I just got this far:

WARNING: Due to the absence of 'LifecycleBasePlugin' on root project 'xxx-parent' the task ':cpdCheck' could not be added to task graph. Therefore CPD will not be executed. To prevent this, manually add a task dependency of ':cpdCheck' to a 'check' task of a subproject.
1) Directly to project ':xxx-subproject':
    check.dependsOn(':cpdCheck')
2) Indirectly, e.g. via root project 'xxx-parent':
    project(':xxx-subproject') {
        plugins.withType(LifecycleBasePlugin) { // <- just required if 'java' plugin is applied within subproject
            check.dependsOn(cpdCheck)
        }
    }
aaschmid commented 4 years ago

It looks like your IT adds a file with duplications to src/main/java? That doesn't look right to me. Yes this was for my test that neither test nor testFixtures are included into source and therefore check for duplicates.

Anyway, for me the hotfix/hack using evaluationDependsOnChildren() also helped here (see linked Gradle issue). evaluationDependsOnChildren seems like a fix, right ;-)

How to run CPD as part of ./gradlew check? How to add CPD to a subject's check? How to define the sources? That is a know limitation if the rootProject has no lifecycle plugin applied (normally implicit via other plugin). The solution is document on https://github.com/aaschmid/gradle-cpd-plugin#multi-module-project.

Depending on your needs, now java sourcesets of all projects will be automatically checked for duplicates (see Code. If you want to override this behaviour, you have a lot of options but note that the lazy evaluation of plugins applied to later configured subprojects adds further sourceSets again unless you use evaluationDependsOnChildren:

cpdCheck {
    source = subprojects*.sourceSets*.main*.java*.srcDirs
}

or

cpdCheck {
    source = subprojects*.sourceSets*.main*.java*.srcDirs + subprojects*.sourceSets*.test*.java*.srcDirs
}

or

cpdCheck {
    source = []
    subprojects.forEach { project -> it.source(project.sourceSets.main.java) } // only if all subprojects have corresponding `sourceSet`
}

or

cpdCheck {
    source = []
    allprojects.forEach { project ->
        project.plugins.withType(JavaBasePlugin) { plugin ->
            project.sourceSets.main.java.forEach { s -> rootProject.cpdCheck.source(s) }
        }
    }
}

Maybe I should document these as well ;-)

Currently best idea for a proper solutions to this: I could provide an additional property with an allowlist or denylist for sourceSet names, e.g. allowedSourceSets = [ "main", "test" ] or deniedSourceSets = [ "testFixture"] which should (not) automatically be added - defaults to all as currently.

C-Otto commented 4 years ago

Thanks for the snippets, I'll try them out tomorrow. I agree, a bit more documentation wouldn't hurt, especially related to the warning I posted earlier (I've seen it several times when I tried to get the plugin to work with subprojects).

The additional options also seem useful!

C-Otto commented 4 years ago

I'm unable to create a configuration that combines the scan for all subprojects, as described in your response above. I'm using this snippet for all variants described below:

evaluationDependsOnChildren()
apply plugin: 'java' // required so that the check task exists?
apply from: 'etc/cpd.gradle'

With this I get a single cpdCheck task, but it finds duplicates in subproject/src/integrationTest:

source = subprojects*.sourceSets*.main*.java*.srcDirs

With this I get the same (duplication in integrationTest):

source = []
subprojects.forEach { project -> it.source(project.sourceSets.main.java) }

Finally, I get

* What went wrong:
A problem occurred evaluating root project 'xxx-parent'.
> Failed to apply plugin class 'org.gradle.api.plugins.JavaBasePlugin'.
   > Could not get unknown property 'main' for SourceSet container of type org.gradle.api.internal.tasks.DefaultSourceSetContainer.

for

source = []
allprojects.forEach { project ->
    project.plugins.withType(JavaBasePlugin) { plugin ->
        project.sourceSets.main.java.forEach { s -> rootProject.cpdCheck.source(s) }
    }
}
aaschmid commented 4 years ago
  1. the java plugin is not required for the CPD plugin to work but it prints a warning if you haven't manually add it to one of the childrens check task if you have no JavaBasePlugin on the root project, see e.g. https://github.com/TNG/junit-dataprovider/blob/master/build.gradle.kts#L357

  2. Hm ... I think the too many sourcesets are still caused by the lazy evalutation logic even if you are using the evalationDependsOnChildren(). Would you mind to adjust your reproducer above? I have not that much time at the moment but would try to fix it if it is urgent to you...

  3. Your exception is also odd as the plugins.withType (see here) should not cause errors if no plugin matches. JavaBasePlugin (see here) should have sourceSets already but maybe they changed that java plugin creates the main sourcesets, see here. And you apply java plugin ...

C-Otto commented 4 years ago

Sorry, I'm confused about the current state. Let's ignore the "global scan", i.e. using CPD to find duplicates between subprojects.

My understanding: This is a bug in your code, where my explicit configuration is overwritten due to lazy initalisation. The configuration order plays a crucial role here, which is why evaluationDependsOnChildren() works around this issue. Despite my assumptions, the java-test-fixtures plugin does not do anything that contributes to this bug (especially this bug: https://github.com/gradle/gradle/issues/13781).

Is this correct?

aaschmid commented 3 years ago

@C-Otto: sorry for the inactivity on this issue but my spare is quite limited at the moment and I honestly higher prioritize other things.

I deeply need to think about the lazy initialization here but still not sure if this is really a bug...

C-Otto commented 3 years ago

As long as CPD does not stick to the configuration, I'd like to think it's a bug. This could be a bug in the code that reads and applies the configuration, or something in the documentation. Either way, it does not work, and I have no way of making it work.

aaschmid commented 3 years ago

I will try to figure out a solution as soon as possible using your reproducer from above. However, you mentioned integrationTest source sets above. Is this use case already part of the reproducer or could you kindly add it?

C-Otto commented 3 years ago

I believe the issue is related to the existence of several source sets, and the name/purpose of these is not relevant. The reproducer contains two commits, where the initial commit exposes the bug with integration tests.