There are several useful potential comparisons between two blind test runs, as indicated in the blind-out.csv files:
Overall # correct
Gains/loss by intent
Improved/regressed utterances
It would be useful to optionally generate these metrics, if a "previous run" is supplied. Ideally, this can be invoked as a standalone process that receives three inputs:
Location of current run
Location of previous run
Location for output file(s) containing the comparison metrics
There are several useful potential comparisons between two blind test runs, as indicated in the blind-out.csv files:
It would be useful to optionally generate these metrics, if a "previous run" is supplied. Ideally, this can be invoked as a standalone process that receives three inputs: