Open diekhans opened 4 years ago
Idea: add provenance records to output that detail steps executed and parameters used.
Right now, it is very difficult to understand exactly what happened during a cactus run. Records would be created detailing the compiled commands executed, along with their arguments and perhaps information about input and output files, such as sizes, genome regions, etc. These merged into a single provenance file that can then be examined and compared to expected results. Due to the random nature of scheduling jobs, this would not be a simple diff.
On the second point, I activated --realTimeLogging
to print out command-lines as they are run by cactus_call()
Without that, I never would have, say, been able to see a difference in cPecanLastz --querydepth
The log output from make evolver_test
could by scanned to check the number of commands, and parameters used (cleaning out filenames and date stamps) against a baseline log to ensure that nothing got changed unexpectedly.
Currently, cactus testing does a very poor job of validating if anything has changed in the results. They generally just test "does it crash". Various problems introduced by the py2 to py3 conversion demonstrate this. Issues such as --config not being passed through and blast parameters changing not being detected are symptomatic of this inability to detect that there are unexpected changes.
The goal of this ticket is to collect ideas for ways to improve test validation.