Drivetrain Automated Test Generation - How to Use and Ideas for Improvement

Commit 0e22bae allows Drivetrain to automatically generate Snapshot tests based on Update Specifications (formerly called Process Specifications, formerly called Processes, typed as turbo:TURBO_0010354) in a given Transformation Instruction Set. These tests provide a "Snapshot" of the expected inputs and outputs of a given Update Specification at a certain point in time, and should be re-generated when changes are made to the Update Specification or to relevant Connection Recipes.

How to use is subject to change in subsequent releases. For now, do this to get it working:

Ensure the Transformation Instruction Set which contains the Update Specification to be tested is listed under the instructionSetFile property in turbo_properties.properties
From SBT console, run "run buildTest ". If you URI is provided, all tests will be built.
A test file will be generated in src//test//scala//edu//upenn//turbo//AutoGenTests//. It will be named with a concatenation of the non-prefix segment of your Instruction Set filename + Update Specification URI + "SnapshotTest". For example, carnival_instructionSet_HomoSapiensExpansionProcessSnapshotTest.scala. If the test already exists, it will be overwritten.
To run the tests, from the SBT console run "test" (to run all tests including legacy tests) or "testOnly ."

Each generated test class will include 2 tests:

All fields test
Minimum fields test

All fields test contains a synthetic set of triples for the given Update Specification based on all possible inputs, required and optional. Minimum fields test contains a synthetic set of triples for the given Update Specification based on only the required inputs. In both cases, the triples are inserted into the testing repo listed in turbo_properties.properties under testingRepository and then the Update Specification is executed against the triples. Results are pulled back by the test class and confirmed to match results hard-coded in the test class, based on the results returned during the generation of the Snapshot test.

Below are a few concerns I have with the current implementation, which can be improved upon in future releases.

[x] Only output predicates are checked Although this is in line with operations of the legacy tests, a more robust solution would check entire triples as output. This requires modifying the Drivetrain code to accept a UUID key to hash for IRI creation, rather than creating one for each instantiation.
- [x] Create test template rather than re-creating structure in individual files This allows us to more easily modify and browse the generated inputs/outputs and also create our own handcrafted tests.
[x] Contexts are not supported The use of Acorn Contexts (allowing multiple instances of the same type to be referenced in a single Update Specification) would confuse the Test Builder. This was not an issue for generating tests for our current instruction sets, because neither of them use contexts in the input.
[x] Multiplicity is not fully tested Regardless of Multiplicity declaration in the Transformation Instruction Set, only one instance of each type is created in the synthetic input triples. A more comprehensive solution could create one instance of both subject and object for multiplicity of 1-1, two instances of a subject and one instance of an object for a many-1, and one instance of a subject and two instances of the object for a 1-many.
[x] Create debugging option for Instruction Set dev This would allow someone developing an instruction set to automatically generate input and browse the output using the GraphDB visualizer, to ensure it is as expected before finalizing and generating the test.
[ ] Statements about processes are not tested They are created in the output repository but not read by the Snapshot test. This is low priority.

PennTURBO / semantic-engine

Drivetrain Automated Test Generation - How to Use and Ideas for Improvement #74