Open azhe825 opened 6 years ago
Hall {'count': 1777, 'truepos': 97, 'falseneg': 5, 'unknownyes': 4, 'falsepos': 14, 'unique': 820}
Wahono {'count': 3583, 'truepos': 59, 'falseneg': 0, 'unknownyes': 3, 'falsepos': 36, 'unique': 1650}
Danijel {'count': 2389, 'truepos': 43, 'falseneg': 2, 'unknownyes': 3, 'falsepos': 35, 'unique': 1100}
Kitchenham {'count': 900, 'truepos': 33, 'falseneg': 1, 'unknownyes': 11, 'falsepos': 6, 'unique': 410}
No Correction | Three Reviewers | Human-machine Disagreements | No Error | |
---|---|---|---|---|
Hall | 89/3460 | 97/1777 | 98/583 | 102/490 |
Wahono | 58/3740 | 59/3583 | 59/2388 | 59/1165 |
Danijel | 42/2100 | 43/2389 | 41/1189 | 45/760 |
Kitchenham | 26/450 | 33/900 | 31/590 | 38/460 |
none | three | machine | No Error | |
---|---|---|---|---|
Hall | 94 / 3100 / 10 / 281 | 99 / 1631 / 3 / 16 | 95 / 683 / 6 / 14 | 102/490/0/0 |
Wahono | 54 / 3440 / 7 / 341 | 57 / 3643 / 2 / 44 | 55 / 1691 / 4 / 20 | 59/1165/0/0 |
Danijel | 41 / 2570 / 4 / 241 | 44 / 2049 / 1 / 26 | 41 / 1060 / 4 / 14 | 45/760/0/0 |
K_all3 | 31 / 460 / 3 / 44 | 34 / 957 / 1 / 11 | 30 / 578 / 5 / 11 | 38/460/0/0 |
Falseneg too high
none | three | machine | machine2 | machine3 | No Error | |
---|---|---|---|---|---|---|
Hall | 98 / 1385 / 5 / 63 | 101 / 1128 / 1 / 4 | 99 / 635 / 2 / 3 | 100 / 725 / 1 / 3 | 99 / 679 / 2 / 1 | 102 / 490 / 0 / 0 |
Wahono | 57 / 1880 / 3 / 88 | 59 / 2913 / 0 / 9 | 58 / 1554 / 1 / 4 | 58 / 1651 / 1 / 6 | 58 / 1510 / 1 / 3 | 59 / 1165 / 0 / 0 |
Danijel | 43 / 1060 / 2 / 50 | 45 / 1755 / 0 / 5 | 43 / 983 / 2 / 2 | 44 / 1071 / 1 / 4 | 43 / 976 / 2 / 2 | 45 / 760 / 0 / 0 |
K_all3 | 33 / 430 / 1 / 20 | 35 / 997 / 0 / 2 | 34 / 606 / 3 / 1 | 34 / 588 / 1 / 3 | 34 / 602 / 2 / 1 | 38 / 460 / 0 / 0 |
ER=0.1
none | three | machine | machine2 | machine3 | No Error | |
---|---|---|---|---|---|---|
Hall | 93 / 3290 / 11 / 311 | 99 / 1608 / 3 / 18 | 96 / 645 / 5 / 11 | 98 / 823 / 3 / 16 | 95 / 776 / 6 / 6 | 102 / 490 / 0 / 0 |
Wahono | 54 / 3455 / 6 / 331 | 58 / 3744 / 1 / 42 | 56 / 1696 / 4 / 16 | 56 / 2161 / 3 / 32 | 55 / 1694 / 4 / 17 | 59 / 1165 / 0 / 0 |
Danijel | 41 / 2705 / 5 / 261 | 44 / 2183 / 1 / 28 | 41 / 1136 / 3 / 11 | 42 / 1248 / 2 / 17 | 41 / 1171 / 3 / 12 | 45 / 760 / 0 / 0 |
K_all3 | 32 / 485 / 4 / 45 | 34 / 961 / 1 / 10 | 29 / 593 / 5 / 6 | 31 / 636 / 3 / 9 | 31 / 651 / 5 / 6 | 38 / 460 / 0 / 0 |
Please repeat for err equals 1, 2, 4, 8
For numbers we got from mannie (on whiteboard) what overall cost?
I emailed Manny for more detailed Error rate information. Since from that on board, I assume that there’s a huge difference between:
ErrorRateA might be much larger than ErrorRateB.
Error
Random error: for each labeling task, human has an ER=0.05 chance of labeling incorrectly
Error Correction:
Run error check every CR=50 docs reviewed:
Sort docs(code=='yes') by its prediction_probability on current classifier, pick bottom 5 for recheck
Sort docs(code=='no') by its prediction_probability on current classifier, pick top 5 for recheck
Above two steps are to find papers whose labeling human and machine disagree with
recheck ask the reviewer to label the selected docs again, with same error rate ER=0.05
If a doc has been labeled same as before, it will not be reckecked in the future.
Results (one run, with BM25, SEMI):
Hall
Wahono
Danijel
Kitchenham
What happens if error rate increased to ER=0.1
Hall
Wahono
Danijel
Kitchenham
Correct Error: {'count': 590, 'truepos': 31, 'falseneg': 3, 'unknownyes': 11, 'falsepos': 9, 'unique': 470}
No correction: {'falseneg': 6, 'falsepos': 51, 'unknownyes': 13, 'truepos': 26} reviewed 450