ai-se / ML-assisted-SLR

Automated Systematic Literature Review
2 stars 2 forks source link

Simple random error results #69

Open azhe825 opened 6 years ago

azhe825 commented 6 years ago

Error

Random error: for each labeling task, human has an ER=0.05 chance of labeling incorrectly

Error Correction:

Run error check every CR=50 docs reviewed:

Results (one run, with BM25, SEMI):

Hall

Wahono

Danijel

Kitchenham

What happens if error rate increased to ER=0.1

Hall

Wahono

Danijel

Kitchenham

azhe825 commented 6 years ago

Baseline

screen shot 2017-10-02 at 9 43 39 am

Results when ER=0.1

Hall {'count': 1777, 'truepos': 97, 'falseneg': 5, 'unknownyes': 4, 'falsepos': 14, 'unique': 820}

Wahono {'count': 3583, 'truepos': 59, 'falseneg': 0, 'unknownyes': 3, 'falsepos': 36, 'unique': 1650}

Danijel {'count': 2389, 'truepos': 43, 'falseneg': 2, 'unknownyes': 3, 'falsepos': 35, 'unique': 1100}

Kitchenham {'count': 900, 'truepos': 33, 'falseneg': 1, 'unknownyes': 11, 'falsepos': 6, 'unique': 410}

azhe825 commented 6 years ago
No Correction Three Reviewers Human-machine Disagreements No Error
Hall 89/3460 97/1777 98/583 102/490
Wahono 58/3740 59/3583 59/2388 59/1165
Danijel 42/2100 43/2389 41/1189 45/760
Kitchenham 26/450 33/900 31/590 38/460
azhe825 commented 6 years ago

30 repeats

none three machine No Error
Hall 94 / 3100 / 10 / 281 99 / 1631 / 3 / 16 95 / 683 / 6 / 14 102/490/0/0
Wahono 54 / 3440 / 7 / 341 57 / 3643 / 2 / 44 55 / 1691 / 4 / 20 59/1165/0/0
Danijel 41 / 2570 / 4 / 241 44 / 2049 / 1 / 26 41 / 1060 / 4 / 14 45/760/0/0
K_all3 31 / 460 / 3 / 44 34 / 957 / 1 / 11 30 / 578 / 5 / 11 38/460/0/0

Problem with machine:

Falseneg too high

azhe825 commented 6 years ago

New results: (try to decrease false negatives from machine)

none three machine machine2 machine3 No Error
Hall 98 / 1385 / 5 / 63 101 / 1128 / 1 / 4 99 / 635 / 2 / 3 100 / 725 / 1 / 3 99 / 679 / 2 / 1 102 / 490 / 0 / 0
Wahono 57 / 1880 / 3 / 88 59 / 2913 / 0 / 9 58 / 1554 / 1 / 4 58 / 1651 / 1 / 6 58 / 1510 / 1 / 3 59 / 1165 / 0 / 0
Danijel 43 / 1060 / 2 / 50 45 / 1755 / 0 / 5 43 / 983 / 2 / 2 44 / 1071 / 1 / 4 43 / 976 / 2 / 2 45 / 760 / 0 / 0
K_all3 33 / 430 / 1 / 20 35 / 997 / 0 / 2 34 / 606 / 3 / 1 34 / 588 / 1 / 3 34 / 602 / 2 / 1 38 / 460 / 0 / 0

ER=0.1

none three machine machine2 machine3 No Error
Hall 93 / 3290 / 11 / 311 99 / 1608 / 3 / 18 96 / 645 / 5 / 11 98 / 823 / 3 / 16 95 / 776 / 6 / 6 102 / 490 / 0 / 0
Wahono 54 / 3455 / 6 / 331 58 / 3744 / 1 / 42 56 / 1696 / 4 / 16 56 / 2161 / 3 / 32 55 / 1694 / 4 / 17 59 / 1165 / 0 / 0
Danijel 41 / 2705 / 5 / 261 44 / 2183 / 1 / 28 41 / 1136 / 3 / 11 42 / 1248 / 2 / 17 41 / 1171 / 3 / 12 45 / 760 / 0 / 0
K_all3 32 / 485 / 4 / 45 34 / 961 / 1 / 10 29 / 593 / 5 / 6 31 / 636 / 3 / 9 31 / 651 / 5 / 6 38 / 460 / 0 / 0
timm commented 6 years ago

Please repeat for err equals 1, 2, 4, 8

For numbers we got from mannie (on whiteboard) what overall cost?

azhe825 commented 6 years ago

I emailed Manny for more detailed Error rate information. Since from that on board, I assume that there’s a huge difference between:

ErrorRateA might be much larger than ErrorRateB.