apache / jena

Apache Jena
https://jena.apache.org/
Apache License 2.0
1.1k stars 647 forks source link

jena-benchmark-jmh module defunct #2492

Closed Aklakan closed 4 months ago

Aklakan commented 4 months ago

Version

5.1.0-SNAPSHOT

What happened?

I encountered this issue as part of trying to set up additional benchmarks while working on #2404

The Java Microbenchmark Harness (JMH) relies on annotation processing in order to produce the BenchmarkList file. Without this file, jmh benchmarks cannot bet run, and any attempt to run one of the junit tests in the module will bail out with the error message JMH Unable to find the resource: /META-INF/BenchmarkList.

# Under jena-benchmarks/jena-benchmarks-jmh
./target/test-classes/META-INF/BenchmarkList

However, annotation processing is disabled in jena's parent pom.

       <plugin>
          <groupId>org.apache.maven.plugins</groupId>
          <artifactId>maven-compiler-plugin</artifactId>
          <version>${ver.plugin.compiler}</version>
          <configuration>
            <release>${java.version}</release>
            <compilerArgument>-proc:none</compilerArgument> <!-- Disables annotation processing -->
          </configuration>
        </plugin>

Also, there is a mismatch between the directory location and package declaration of JMHDefaultOptions.java which causes eclipse to complain.

Relevant output and stacktrace

No response

Are you interested in making a pull request?

Yes

Aklakan commented 4 months ago

It seems there is not much difference between having annotation processing globally enabled or disabled - so not sure what the rationale was for disabling it in the first place. In any case it took me quite a while to figure out why jmh was not generating the BenchmarkList file - the jmh plugin doesn't seem to report the generation of the file in the log messages.

For the following command which focuses mostly on compilation I get the following times dependending on whether <compilerArgument>-proc:none</compilerArgument> is present or not:

mvn -Pdev -Drat.skip -DskipTests -Denforcer.skip clean compile -pl '!:jena-fuseki-webapp,!:jena-fuseki-ui'

Annotation processing... ...enabled: Total time: 16.535 s Total time: 17.811 s Total time: 19.115 s

...disabled: Total time: 15.467 s Total time: 18.223 s Total time: 18.693 s

afs commented 4 months ago

-proc:none is for building (cross-compiling) with Java21. That issues warnings. It's only a partial solution.

What is more, there are some tools (e.g. mockito) that modify the loader classpath and haven't caught up. We'll have to see how they evolve

Aklakan commented 4 months ago

Alternative fixes could be for the jena-benchmarks-jmh module specifically to override the compiler plugin config - or that module could place the compiler plugin override into a separate profile (with a note in that module's README).

My main qualms here is, that jmh was not working, and none of the stackoverflow or chatgpt solutions (which mainly reiterated the stackoverflow ones) worked. At least I documented this possible reason for the jmh issue also at https://stackoverflow.com/a/78528826/160790

arne-bdt commented 4 months ago

Since I was the one who introduced JMH into Jena last year, I feel somewhat responsible.

If I understand correctly, -proc:none was introduced in October 2023 to support Java 21. The current proposed solution is to remove -proc:none. This proposal has been approved by @kinow.

On my machine, the proposed solution does not seem to work properly. Initially, I get a lot of errors, but further down the line, the benchmarks seem to run successfully. Here's what the first lines look like:

[INFO] --- surefire:3.2.5:test (default-test) @ jena-benchmarks-jmh ---
[INFO] Using auto detected provider org.apache.maven.surefire.junitplatform.JUnitPlatformProvider
[INFO] 
[INFO] -------------------------------------------------------
[INFO]  T E S T S
[INFO] -------------------------------------------------------
[INFO] Running org.apache.jena.mem.graph.jmh_generated.TestGraphAdd_jmhType
[ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0.057 s <<< FAILURE! -- in org.apache.jena.mem.graph.jmh_generated.TestGraphAdd_jmhType
[ERROR] org.apache.jena.mem.graph.jmh_generated.TestGraphAdd_jmhType.benchmark -- Time elapsed: 0.046 s <<< ERROR!
No benchmarks to run; check the include/exclude regexps.
    at org.openjdk.jmh.runner.Runner.internalRun(Runner.java:257)
    at org.openjdk.jmh.runner.Runner.run(Runner.java:208)
    at org.apache.jena.mem.graph.TestGraphAdd.benchmark(TestGraphAdd.java:97)
    at java.base/java.lang.reflect.Method.invoke(Method.java:580)

[INFO] Running org.apache.jena.mem.graph.jmh_generated.TestGraphAdd_jmhType_B1
[ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0.007 s <<< FAILURE! -- in org.apache.jena.mem.graph.jmh_generated.TestGraphAdd_jmhType_B1
[ERROR] org.apache.jena.mem.graph.jmh_generated.TestGraphAdd_jmhType_B1.benchmark -- Time elapsed: 0.006 s <<< ERROR!
No benchmarks to run; check the include/exclude regexps.
    at org.openjdk.jmh.runner.Runner.internalRun(Runner.java:257)
    at org.openjdk.jmh.runner.Runner.run(Runner.java:208)
    at org.apache.jena.mem.graph.TestGraphAdd.benchmark(TestGraphAdd.java:97)
    at java.base/java.lang.reflect.Method.invoke(Method.java:580)

@Aklakan do you not have that problem?

arne-bdt commented 4 months ago

I am able to get rid of the errors, by excluding the JMH generated files:

<plugin>
    <groupId>org.apache.maven.plugins</groupId>
    <artifactId>maven-surefire-plugin</artifactId>
    <configuration>
        <skipTests>${benchmark.skip}</skipTests>
       ...
        <excludes>
            <exclude>**/jmh_generated/*</exclude>
        </excludes>
    </configuration>
</plugin>

@Aklakan Would you add this exclude in your PR?

@afs, I am still under the impression that -proc:none is needed for building (cross-compiling) with Java 21. Is that correct?

Aklakan commented 4 months ago

I didn't see any errors on my setup (see output below).

I yet need to test what happens on my machine when I add your change.

One issue - though not important right now - is that running the benchmarks with mvn (rather than the IDE) takes way too long because there seems to be a ~1 second wait between iterations - even though most iterations only take milliseconds. Right now I just want to be able to run selected jmh benchmarks via junit in eclipse, and for this its just important that the BenchmarkList file is somehow generated.

mvn -v
Apache Maven 3.8.6 (84538c9988a25aec085021c365c560670ad80f63)
Maven home: /opt/maven/current
Java version: 17.0.10, vendor: Private Build, runtime: /usr/lib/jvm/java-17-openjdk-amd64
Default locale: en_US, platform encoding: UTF-8
OS name: "linux", version: "6.5.0-35-generic", arch: "amd64", family: "unix"
jena$ mvn -Dbenchmark.skip=false -Pdev -Drat.skip clean install
...
[INFO] --- maven-surefire-plugin:3.2.5:test (default-test) @ jena-benchmarks-jmh ---
[INFO] Using auto detected provider org.apache.maven.surefire.junitplatform.JUnitPlatformProvider
[INFO] 
[INFO] -------------------------------------------------------
[INFO]  T E S T S
[INFO] -------------------------------------------------------
[INFO] Running org.apache.jena.mem.graph.TestGraphFindByMatchAndCount
# JMH version: 1.37
# VM version: JDK 17.0.10, OpenJDK 64-Bit Server VM, 17.0.10+7-Ubuntu-122.04.1
# VM invoker: /usr/lib/jvm/java-17-openjdk-amd64/bin/java
# VM options: -Xmx12G
# Blackhole mode: compiler (auto-detected, use -Djmh.blackhole.autoDetect=false to disable)
# Warmup: 5 iterations, single-shot each
# Measurement: 15 iterations, single-shot each
# Timeout: 10 min per iteration
# Threads: 1 thread, will synchronize iterations
# Benchmark mode: Average time, time/op
# Benchmark: org.apache.jena.mem.graph.TestGraphFindByMatchAndCount.graphFindSP_
# Parameters: (param0_GraphUri = ../testing/cheeses-0.1.ttl, param1_GraphImplementation = GraphMem (current), param2_sampleSize = 800)

# Run progress: NaN% complete, ETA 00:00:00
# Fork: 1 of 1
# Warmup Iteration   1: 0.013 s/op
# Warmup Iteration   2: 0.002 s/op
...
Aklakan commented 4 months ago

I should add that I did not wait for all benchmarks to complete; I aborted the benchmark build after about 30min (running the benchmarks from the CLI shouldn't take that long in the first place). So if the errors only occurred after that time I did not see them for that reason.

@arne-bdt What's the output of mvn -v on your environment?

arne-bdt commented 4 months ago

@Aklakan The output of mvn -v on my system is:

Apache Maven 3.9.6 (bc0240f3c744dd6b6ec2920b3cd08dcc295161ae)
Maven home: C:\Program Files\JetBrains\IntelliJ IDEA 2023.2.4\plugins\maven\lib\maven3
Java version: 21.0.2, vendor: Eclipse Adoptium, runtime: C:\Program Files\Eclipse Adoptium\jdk-21.0.2.13-hotspot
Default locale: de_DE, platform encoding: UTF-8
OS name: "windows 10", version: "10.0", arch: "amd64", family: "windows"

Usually, I work with IntelliJ IDEA. There, the shaded library jena-benchmarks-shadedJena480 has to be manually added to the classpath to make the unit tests work, unlike the Maven build, which works fine without any modifications. I have not found another solution:

2024-05-29 11_20_42-Project Structure

So, I installed Eclipse today and was able to run the unit tests via JUnit by manually adding jena-benchmarks-shadedJena480-5.1.0-SNAPSHOT.jar to the dependencies:

2024-05-29 10_41_40-eclipse-workspace - jena-benchmarks-jmh_src_test_java_org_apache_jena_jmh_helper

The benchmarks are intentionally disabled because they run for a very long time. Even setting warmupIterations to zero and measurementIterations to one in JMHDefaultOptions, the tests still run for more than 45 minutes. Therefore, I have no idea how to ensure that the JMH benchmarks never become defunct again.

To me, the JMH benchmarks in combination with the shaded Jena library remain the best way to compare the behavior and speed of the current Jena version with older versions.

Aklakan commented 4 months ago

In eclipse you must exclude the shaded 480 module from the maven import, then it should work directly, without having to manually fiddle with imports. Excluding the module with the shaded artifacts causes eclipse to download the jar from maven central. (Alternatively, one can manually run mvn install on jena-benchmarks-shadedJena480 to make it available via the local repository.)

As for the runtime, I think the reason for the long runtime is the delay between two iterations. Making jmh benchmarking work faster from the CLI was something I would have considered addressing in a future PR. The priority is the have it working at all. It's very useful that you already added this module; I just encountered this issue when trying to build upon it.

afs commented 4 months ago

An option is to make the benchmarks run separately and have a Jenkins job that runs the benchmarks.

Source code goes into a release just by being in the github repo (the source-release zip is a zip of the git repository).