eval style & counting of edges

alecristia commented 6 years ago

There is disagreement in the literature (& our lab) as to whether utterance edges should be counted towards the boundary performance. I think now we count them as wrong, whereas in some cases the score should be undefined because the denominator is 0. Could we please add a series of boundary outcomes to represent this? Example 1: seg=hello, gold=hello boundary_all_precision=1 boundary_all_recall=1 boundary_all_fscore=1 boundary_noedge_precision=NA boundary_noedge_recall=NA boundary_noedge_fscore=NA

Example 2: seg=he llo, gold=hello boundary_all_precision=2/3 boundary_all_recall=2/3 boundary_all_fscore=8/9 boundary_noedge_precision=0 boundary_noedge_recall=NA boundary_noedge_fscore=NA

Example 3: seg=hell o, gold=h ello boundary_all_precision=2/3 boundary_all_recall=2/3 boundary_all_fscore=8/9 boundary_noedge_precision=0 boundary_noedge_recall=0 boundary_noedge_fscore=0

While at it, could we please sort the output as follows: token_precision 1 token_recall 1 token_fscore 1 type_precision 1 type_recall 1 type_fscore 1 boundary_all_precision=1 boundary_all_recall=1 boundary_all_fscore=1 boundary_noedge_precision=NA boundary_noedge_recall=NA boundary_noedge_fscore=NA

mmmaat commented 6 years ago

Ok I'm not sure I understood that very well but I'll ask you when working on that...

mmmaat commented 6 years ago

I'm implementing that, what I have is (in bold the differences)

Example 2: seg=he llo, gold=hello boundary_all_precision=2/3 boundary_all_recall=1.0 boundary_all_fscore=4/5 boundary_noedge_precision=0 boundary_noedge_recall=NA boundary_noedge_fscore=0

Example 3: seg=hell o, gold=h ello boundary_all_precision=2/3 boundary_all_recall=2/3 boundary_all_fscore=2/3 boundary_noedge_precision=0 boundary_noedge_recall=0 boundary_noedge_fscore=0

mmmaat commented 6 years ago

Done in f2594049f9ef7ca8cfcd77ea74011a8e73bddeeb

bootphon / wordseg

eval style & counting of edges #21