Improve cactus testability

ComparativeGenomicsToolkit / cactus

Official home of genome aligner based upon notion of Cactus graphs

Other

529 stars 111 forks source link

Improve cactus testability #168

Open diekhans opened 4 years ago

diekhans commented 4 years ago

Currently, cactus testing does a very poor job of validating if anything has changed in the results. They generally just test "does it crash". Various problems introduced by the py2 to py3 conversion demonstrate this. Issues such as --config not being passed through and blast parameters changing not being detected are symptomatic of this inability to detect that there are unexpected changes.

The goal of this ticket is to collect ideas for ways to improve test validation.

diekhans commented 4 years ago

Idea: add provenance records to output that detail steps executed and parameters used.

Right now, it is very difficult to understand exactly what happened during a cactus run. Records would be created detailing the compiled commands executed, along with their arguments and perhaps information about input and output files, such as sizes, genome regions, etc. These merged into a single provenance file that can then be examined and compared to expected results. Due to the random nature of scheduling jobs, this would not be a simple diff.

glennhickey commented 4 years ago

On the second point, I activated --realTimeLogging to print out command-lines as they are run by cactus_call() Without that, I never would have, say, been able to see a difference in cPecanLastz --querydepth

The log output from make evolver_test could by scanned to check the number of commands, and parameters used (cleaning out filenames and date stamps) against a baseline log to ensure that nothing got changed unexpectedly.