Clean up test suite to reduce build time

abrokenjester commented 3 years ago

The current full validation test cycle runs at close to an hour, 22 minutes of which are taken up by ShaclSail-related tests:

[INFO] Reactor Summary for Eclipse RDF4J 3.6.2-SNAPSHOT:
[INFO] 
[INFO] Eclipse RDF4J ...................................... SUCCESS [  1.318 s]
[INFO] RDF4J Assembly Descriptors ......................... SUCCESS [  0.960 s]
[INFO] RDF4J Core ......................................... SUCCESS [  0.533 s]
[INFO] RDF4J: Model API ................................... SUCCESS [  5.406 s]
[INFO] RDF4J: util ........................................ SUCCESS [  7.859 s]
[INFO] RDF4J: RDF Vocabularies ............................ SUCCESS [  2.246 s]
[INFO] RDF4J: Model ....................................... SUCCESS [  5.992 s]
[INFO] RDF4J: SparqlBuilder ............................... SUCCESS [  3.259 s]
[INFO] RDF4J: Rio ......................................... SUCCESS [  0.474 s]
[INFO] RDF4J: Rio - API ................................... SUCCESS [  3.328 s]
[INFO] RDF4J: Rio - Languages ............................. SUCCESS [  1.262 s]
[INFO] RDF4J: Rio - Datatypes ............................. SUCCESS [  1.960 s]
[INFO] RDF4J: Query ....................................... SUCCESS [  1.760 s]
[INFO] RDF4J: Rio - Binary ................................ SUCCESS [  3.095 s]
[INFO] RDF4J: Rio - N-Triples ............................. SUCCESS [  4.262 s]
[INFO] RDF4J: Rio - HDT ................................... SUCCESS [  2.421 s]
[INFO] RDF4J: Rio - JSON-LD ............................... SUCCESS [  4.545 s]
[INFO] RDF4J: Rio - Turtle ................................ SUCCESS [  7.678 s]
[INFO] RDF4J: Rio - N3 (writer-only) ...................... SUCCESS [  2.055 s]
[INFO] RDF4J: Rio - N-Quads ............................... SUCCESS [  4.135 s]
[INFO] RDF4J: Rio - RDF/JSON .............................. SUCCESS [  3.405 s]
[INFO] RDF4J: Rio - RDF/XML ............................... SUCCESS [  6.165 s]
[INFO] RDF4J: Rio - TriX .................................. SUCCESS [  2.409 s]
[INFO] RDF4J: Rio - TriG .................................. SUCCESS [  6.440 s]
[INFO] RDF4J: Query result IO ............................. SUCCESS [  0.460 s]
[INFO] RDF4J: Query result IO - API ....................... SUCCESS [  0.855 s]
[INFO] RDF4J : Test Suites ................................ SUCCESS [  0.464 s]
[INFO] RDF4J: QueryResultIO testsuite ..................... SUCCESS [  0.842 s]
[INFO] RDF4J: Query result IO - binary .................... SUCCESS [  1.692 s]
[INFO] RDF4J: Query result IO - SPARQL/JSON ............... SUCCESS [  1.809 s]
[INFO] RDF4J: Query result IO - SPARQL/XML ................ SUCCESS [  2.422 s]
[INFO] RDF4J: Query result IO - plain text booleans ....... SUCCESS [  2.100 s]
[INFO] RDF4J: Query algebra ............................... SUCCESS [  0.476 s]
[INFO] RDF4J: Query algebra - model ....................... SUCCESS [  3.916 s]
[INFO] RDF4J: Query parser ................................ SUCCESS [  0.506 s]
[INFO] RDF4J: Query parser - API .......................... SUCCESS [  1.409 s]
[INFO] RDF4J: Query parser - SeRQL ........................ SUCCESS [  3.005 s]
[INFO] RDF4J: Sail ........................................ SUCCESS [  0.481 s]
[INFO] RDF4J: Sail API .................................... SUCCESS [  4.380 s]
[INFO] RDF4J: Repository .................................. SUCCESS [  0.492 s]
[INFO] RDF4J: Repository - API ............................ SUCCESS [  2.798 s]
[INFO] RDF4J: HTTP ........................................ SUCCESS [  0.456 s]
[INFO] RDF4J: HTTP protocol ............................... SUCCESS [  1.631 s]
[INFO] RDF4J: HTTP client ................................. SUCCESS [  6.605 s]
[INFO] RDF4J: Query parser - SPARQL ....................... SUCCESS [ 10.101 s]
[INFO] RDF4J: SPARQL Repository ........................... SUCCESS [  3.582 s]
[INFO] RDF4J: Query algebra - evaluation .................. SUCCESS [ 10.978 s]
[INFO] RDF4J: Repository API testsuite .................... SUCCESS [  0.939 s]
[INFO] RDF4J: SailRepository .............................. SUCCESS [  3.588 s]
[INFO] RDF4J: Repository - event (wrapper) ................ SUCCESS [  2.576 s]
[INFO] RDF4J: HTTPRepository .............................. SUCCESS [  1.906 s]
[INFO] RDF4J: Repository manager .......................... SUCCESS [  4.199 s]
[INFO] RDF4J: Sail base implementations ................... SUCCESS [  3.570 s]
[INFO] RDF4J: Sail API testsuite .......................... SUCCESS [  0.833 s]
[INFO] RDF4J: MemoryStore ................................. SUCCESS [ 22.233 s]
[INFO] RDF4J: Query algebra - GeoSPARQL ................... SUCCESS [  3.450 s]
[INFO] RDF4J: Query Rendering ............................. SUCCESS [  3.201 s]
[INFO] RDF4J: DatasetRepository (wrapper) ................. SUCCESS [  1.651 s]
[INFO] RDF4J: Repository - context aware (wrapper) ........ SUCCESS [  2.498 s]
[INFO] RDF4J: Model API testsuite ......................... SUCCESS [  0.730 s]
[INFO] RDF4J: Sail Model .................................. SUCCESS [  1.740 s]
[INFO] RDF4J: NativeStore ................................. SUCCESS [01:41 min]
[INFO] RDF4J: Inferencer Sails ............................ SUCCESS [ 36.505 s]
[INFO] RDF4J: Federation SAIL ............................. SUCCESS [ 28.368 s]
[INFO] RDF4J: SPIN ........................................ SUCCESS [  4.575 s]
[INFO] RDF4J: SPIN SAIL ................................... SUCCESS [ 26.085 s]
[INFO] RDF4J: SHACL ....................................... SUCCESS [22:25 min]
[INFO] RDF4J Lucene Sail API .............................. SUCCESS [  3.611 s]
[INFO] RDF4J Lucene Sail Index ............................ SUCCESS [  3.462 s]
[INFO] RDF4J Lucene Sail Spin ............................. SUCCESS [  4.714 s]
[INFO] RDF4J Solr Sail Index .............................. SUCCESS [  2.355 s]
[INFO] RDF4J Elastic Search Sail Index .................... SUCCESS [  2.799 s]
[INFO] RDF4J Extensible Store ............................. SUCCESS [ 24.058 s]
[INFO] RDF4J Elasticsearch Store .......................... SUCCESS [07:03 min]
[INFO] RDF4J: Client Libraries ............................ SUCCESS [  0.790 s]
[INFO] RDF4J: Storage Libraries ........................... SUCCESS [  1.166 s]
[INFO] RDF4J Tools ........................................ SUCCESS [  0.431 s]
[INFO] RDF4J: application configuration ................... SUCCESS [  1.102 s]
[INFO] RDF4J: Console ..................................... SUCCESS [  6.500 s]
[INFO] RDF4J: HTTP server - core .......................... SUCCESS [  3.619 s]
[INFO] RDF4J SPARQL compliance test suite ................. SUCCESS [  1.271 s]
[INFO] RDF4J: Federation .................................. SUCCESS [01:15 min]
[INFO] RDF4J: HTTP server ................................. SUCCESS [ 18.843 s]
[INFO] RDF4J Workbench .................................... SUCCESS [  5.310 s]
[INFO] RDF4J: Runtime ..................................... SUCCESS [  1.193 s]
[INFO] RDF4J: Runtime - OSGi .............................. SUCCESS [  5.810 s]
[INFO] RDF4J Rio compliance test suite .................... SUCCESS [  3.174 s]
[INFO] RDF4J SeRQL test suite ............................. SUCCESS [  0.914 s]
[INFO] RDF4J SHACL compliance test suite .................. SUCCESS [  0.900 s]
[INFO] RDF4J Lucene Sail Tests ............................ SUCCESS [  1.061 s]
[INFO] RDF4J GeoSPARQL compliance test suite .............. SUCCESS [  0.951 s]
[INFO] RDF4J: benchmarks .................................. SUCCESS [  2.697 s]
[INFO] RDF4J Compliance tests ............................. SUCCESS [  0.482 s]
[INFO] RDF4J Repository compliance tests .................. SUCCESS [01:21 min]
[INFO] RDF4J Rio compliance tests ......................... SUCCESS [ 10.494 s]
[INFO] RDF4J Model compliance tests ....................... SUCCESS [  9.406 s]
[INFO] RDF4J SeRQL query parser compliance tests .......... SUCCESS [  7.731 s]
[INFO] RDF4J SPARQL query parser compliance tests ......... SUCCESS [01:25 min]
[INFO] RDF4J SHACL compliance tests ....................... SUCCESS [  1.530 s]
[INFO] RDF4J Lucene Sail Tests ............................ SUCCESS [  3.599 s]
[INFO] RDF4J Solr Sail Tests .............................. SUCCESS [ 13.576 s]
[INFO] RDF4J Elasticsearch Sail Tests ..................... SUCCESS [ 34.647 s]
[INFO] RDF4J GeoSPARQL compliance tests ................... SUCCESS [  2.580 s]
[INFO] RDF4J code examples ................................ SUCCESS [  1.288 s]
[INFO] RDF4J BOM .......................................... SUCCESS [  0.504 s]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time:  43:11 min
[INFO] Finished at: 2021-03-16T04:28:59Z
[INFO] ------------------------------------------------------------------------

This needs to be cleaned up and reduced:

there are possibly redundant tests in the ShaclSail tests that can be removed
there are several tests that are long-running, but too useful to just remove: we should disable these tests by default and only enable again when testing/debugging a specific Shacl feature

Apart from the Shacl tests, there are other possible places where the test suite can be cleaned up, e.g. for Elasticsearch.

abrokenjester commented 3 years ago

A possible thing to look into once #2876 has been completed, is to use JUnit 5's Conditional Execution mechanism (see https://junit.org/junit5/docs/current/user-guide/#extensions-conditions) to give us more control. For example by introducing our own condition extension that we can trigger via some cusom annotation on slow tests we could make enabling/disabling the running of these tests easier to deal with than manually adding/removing @Ignore annotations everywhere.

abrokenjester commented 3 years ago

@hmottestad and @barthanssens I need some opinions from you both.

We clearly have a need to organize the test suite better, so that we can better distinguish between longer-running (integration/compliance) tests and quick/simple (unit) tests. The main need we have is that we can easily distinguish them as part of the build workflow, so that we do not have to run certain expensive/slow tests for every build.

Before we dive into the particulars of that need, a quick overview of how things are currently organized: we currently have a hard distinction between unit tests and compliance/integration tests.

all unit tests are co-located in the same module as the code being tested (in src/test/java, standard maven conventions).
all integration/compliance tests are in separate compliance modules (in compliance/...).

The reason for the separate compliance modules is twofold:

it allows creating tests that require dependencies on other parts of the framework which could not be included in the core module directly (because of cyclic dependencies).
it introduces an easy way to distinguish which tests get run at which time - the tests in compliance are set up to only be run by the maven failsafe plugin, in the verify phase, while the unit test are run by surefire, in the test phase.

However, in several modules, such as most of the sails, this model has been let go: all tests, both integration and unit tests, are all co-located directly in the main module, instead of in the compliance module. This technically works because they need no downstream dependencies, but it breaks the assumption about using the failsafe plugin to run larger integration/compliance tests, in the integration-test phase.

One way we could address this is to start using file naming conventions to distinguish between integration and unit tests.

We could reconfigure the surefire and failsafe plugins so that they only pick up tests meant for them by means of file name patterns. The default for surefire could be Test.java, and for failsafe IT.java (this is actually the default pattern of the plugin, we override it in the project setup).

If we do this it will allow having integration tests alongside the unit tests in the same module, while still being able to execute them in different phases of the build cycle (and optionally skipping one or the other).

Note that we would still have separate compliance modules for some parts of the framework, but really only for the purpose of avoid circular dependencies.

This setup goes some way towards giving us more control over build/test times, but I think we need something on top of this, to be able to identify particular integration/compliance tests as "slow" or "long-running". The need we have here is that we can disable those specific tests in a normal build (independently from whether we run unit / integration tests or not), but still be able to easily choose to run them, either from the command line, or as part of a separate github action. This is possibly where those conditional executions from Junit 5 might help.

Finally: I believe there is a lot of redundancy in testing, in particular in the SHACL module, and some of the tests in there conflate what is being tested. If I understand correctly, some of these tests are supposed to be checks on native store behavior. Clearly, such tests should not be in the shacl module, but in either the nativestore module itself, or part of a separate compliance module. This needs a cleanup. I'd be grateful, @hmottestad, if you could make an effort in creating some clarity, and perhaps simplifying or removing a few tests that are redundant. Surely, it is not necessary that every single build spends 22 minutes running just the ShaclSail tests.

As an aside: I have long dreamt of a setup where only the tests downstream from code changes are executed - I mean it's ridiculous that I make a change in the Rio parser and all SPARQL compliance tests get executed. But as far as I'm aware there is just no easy way to set that up in combination with Github Actions. If anyone has ideas on how that could be done I'd love to hear it.

hmottestad commented 3 years ago

We could reconfigure the surefire and failsafe plugins so that they only pick up tests meant for them by means of file name patterns. The default for surefire could be Test.java, and for failsafe IT.java (this is actually the default pattern of the plugin, we override it in the project setup).

Let's do that!

And configure GitHub CI so that there is a new PR action for failsafe tests. That action can be optional, so you can merge without it having finished.

I know the SHACL tests are slow. It is mainly the fuzzing tests that take time. Without those the total test time is what 5 min maybe? Can you point me to the fuzzing tests we have for the NativeStore?

barthanssens commented 3 years ago

We clearly have a need to organize the test suite better, so that we can better distinguish between longer-running (integration/compliance) tests and quick/simple (unit) tests. The main need we have is that we can easily distinguish them as part of the build workflow, so that we do not have to run certain expensive/slow tests for every build.

Good idea indeed

We could reconfigure the surefire and failsafe plugins so that they only pick up tests meant for them by means of file name patterns. The default for surefire could be Test.java, and for failsafe IT.java (this is actually the default pattern of the plugin, we override it in the project setup).

That's a nice way of handling it... and this distinction is IMHO also useful for developers who are new to (a part of the) codebase

hmottestad commented 3 years ago

Finally: I believe there is a lot of redundancy in testing, in particular in the SHACL module, and some of the tests in there conflate what is being tested. If I understand correctly, some of these tests are supposed to be checks on native store behavior. Clearly, such tests should not be in the shacl module, but in either the nativestore module itself, or part of a separate compliance module. This needs a cleanup. I'd be grateful, @hmottestad, if you could make an effort in creating some clarity, and perhaps simplifying or removing a few tests that are redundant. Surely, it is not necessary that every single build spends 22 minutes running just the ShaclSail tests.

There isn't actually much redundancy. The slow tests that you point to in the SHACL module are pretty much integration tests. They check what happens when you bombard the ShaclSail with lots of parallel transactions. Unfortunately I've found it to be the case that for some bugs the MemoryStore backed tests will fail more rapidly and for others the NativeStore tests will fail. Most recently there was a deadlock issue in the ShaclSail against the RDFS reasoner which only failed when it was backed by the NativeStore, since the MemoryStore one was too "fast" to get stuck.

eclipse-rdf4j / rdf4j

Clean up test suite to reduce build time #2925