BioLockJ-Dev-Team / sheepdog_testing_suite

Test suite for BioLockJ development team.
3 stars 8 forks source link

Validation option for comparing to an existing table #284

Open IvoryC opened 3 years ago

IvoryC commented 3 years ago

An option field in the expectation file might be matchTable, and it would give file path.
For this comparison, it would read the output table, and read the matchTable, and make each a map of rowName+ColumnName -> value Each matching value between the two maps is a +1. total = length(union(map keys) ) score = matching / total validation.matchScore = [0,1], default: 1

"This failed because 55 out of 7,648 table values were not identical." This would also be an easy automated way to summarize the differences that might be hard to find by eye. "New table has added columns: Weight" "New table is missing columns: Weight (lb)" "New table is missing rows: Sample8, Sample24"

The md5 method is a way to check for exactly the same results, with minimal info stored. Passing is very informative, but failing is not informative at all. This new feature would be another soft validation option. It would be soft enough to be used in reproducing research projects. Tables are usually small enough that the data to store the results table is minimal, but the insight gained by seeing the difference is very informative---is my table giving me very different answers? or did something go wrong and it is truncated? Is the output a different type? Are the values actually similar, but no-identical, maybe because rounding was handled differently (thus being the 'same' while failing md5), or maybe only a scattering of values is different at all, like what we see with the Rdp table output.

...and this would help a lot in testing...