Integration tests for JenaSystem#init

arne-bdt commented 2 weeks ago

Version

5.3.0-SNAPSHOT

Feature

Implementing a JUnit integration test for JenaSystem#init is straightforward, but it must be run as a single-method test to ensure a fresh JVM. Only then can we effectively debug and test the execution of static initializers.

    @Test
    public void init() {
        assertDoesNotThrow(
                () -> JenaSystem.init());
    }

A regression test for apache/jena#2675 would look like this:

    @Test
    public void initRDFConnectionFuseki() {
        try (RDFConnection conn = RDFConnectionFuseki.service("http://localhost:3030/ds").build()) {
            assertTrue(true);
        }
    }

For apache/jena#2787, the regression test is as follows:

    @Test
    public void initParallel() {
        var pool = Executors.newFixedThreadPool(2);
        try {
            var futures = IntStream.range(1, 3)
                    .mapToObj(i -> pool.submit(() -> {
                        if (i % 2 == 0) {
                            ModelFactory.createDefaultModel();
                        } else {
                            JenaSystem.init();
                        }

                        return i;
                    }))
                    .toList();
            var intSet = new HashSet<Integer>();
            assertTimeoutPreemptively(
                    Duration.of(4, ChronoUnit.SECONDS),
                    () -> {
                        for (var future : futures) {
                            intSet.add(future.get());
                        }
                    });
            assertEquals(2, intSet.size());
        } catch (Exception e) {
            throw new RuntimeException(e);
        } finally {
            pool.shutdown();
        }
    }

When running multiple tests within a suite or a single file, static initializers may be triggered unpredictably, as any test could initialize them, and they cannot be re-triggered within the same JVM instance (or class loader). I attempted various JUnit annotations and methods to make these tests work within a Maven build, but found no JUnit-based solution. However, I found a workaround using the JMH benchmark framework, which runs each benchmark in a freshly started JVM.

JMH uses code generation, which needs to be triggered whenever the benchmark code is changed. For JetBrains IDEA, there's a plugin that automates code generation in the background. Unfortunately, for other IDEs like Eclipse, this process has to be triggered manually by running mvn clean install.

I would like to submit a PR to extend the integration tests. However, I’d like to confirm whether introducing JMH into the jena-integration-tests is accepted by the Jena developer community before proceeding further.

Are you interested in contributing a solution yourself?

Yes

OyvindLGjesdal commented 5 days ago

Would configuring all of the integration tests to run in their own JVM process in the POM solve the issue of a clean JVM?

It could slow down the builds if configured in the integration test pom (https://github.com/apache/jena/blob/main/jena-integration-tests/pom.xml#L199), according to the docs

forkCount=1/reuseForks=false executes each test class in its own JVM process, one after another. It creates the highest level of separation for the test execution, but it would probably also give you the longest execution time of all the available options. Consider it as a last resort.

https://maven.apache.org/surefire/maven-surefire-plugin/examples/fork-options-and-parallel-execution.html#parallel-maven-surefire-plugin-execution-in-multi-module-maven-p

I don't know if it is possible, but setting up a second surefire test execution, using multiple executor's with a different naming-pattern to catch integration tests that need a fresh JVM, could also be an option, if the integration tests increase a lot in time spent, if updating all tests is a no-go.

arne-bdt commented 5 days ago

@OyvindLGjesdal Thank you for suggesting multiple Surefire test executions - it might be a great idea. However, there are a couple of minor issues with this approach:

I would need to define exclusive naming patterns to distinguish between regular tests and isolated tests.
Each test class would be limited to containing only one test method for a single isolated test. Since we only have three such tests, this might be manageable.

I will give it a try.

At the moment, I prefer the JMH variant because it is straightforward and doesn't present the issues mentioned above.

apache / jena