Random Human Error - Githubissues

Error

Random error: for each labeling task, human has an ER = 0.00 / 0.02 / 0.05 / 0.10 chance of labeling incorrectly

Error Correction:

none:

Run with FAST2 (BM25+SEMI)

three:

screen shot 2017-10-02 at 9 43 39 am Each doc will be labeled at least 2 times, at most 3 times.

machine

Run error check every CR=50 docs reviewed:

Sort docs(code=='yes') by its prediction_probability on current classifier, pick bottom 10 for recheck
Sort docs(code=='no') by its prediction_probability on current classifier, pick top 10 for recheck
Above two steps are to find papers whose labeling human and machine disagree with
recheck ask the reviewer to label the selected docs again, with same error rate ER
If a doc has been labeled same as before, or has been labeled for 3 times, it will be frozen (will not be reckecked in the future).

machine2

Same to machine but:

for each paper previous coded as 'relevant' and currently selected as suspicious, ask reviewer to review it until it is frozen. (to decrease false negative)

machine3

Same to machine but:

for each paper coded as 'relevant', immediately ask reviewers to review it again until frozen. (so that no suspicious paper will be picked from 'relevant' side and false negative is expected to be decreased)

Results

medians reported
truepos / cost / falseneg / falsepos

ER = 10%

	No Error	none	three	machine	machine2	machine3
Hall	102 / 490 / 0 / 0	95 / 2930 / 8 / 276	100 / 1696 / 2 / 17	94 / 722 / 7 / 12	98 / 855 / 3 / 15	98 / 862 / 4 / 9
Wahono	59 / 1160 / 0 / 0	54 / 3315 / 6 / 320	58 / 3823 / 1 / 47	55 / 1721 / 4 / 19	56 / 2217 / 2 / 33	56 / 1919 / 3 / 28
Danijel	45 / 750 / 0 / 0	41 / 2535 / 5 / 248	45 / 2303 / 1 / 28	41 / 1033 / 3 / 10	42 / 1217 / 2 / 18	43 / 1209 / 2 / 16
K_all3	37 / 500 / 0 / 0	30 / 470 / 3 / 46	34 / 964 / 1 / 10	30 / 570 / 4 / 7	30 / 590 / 3 / 10	32 / 614 / 3 / 8

	No Error	none	three	machine	machine2	machine3
Hall	102 / 490 / 0 / 0	93 / 3290 / 11 / 311	99 / 1608 / 3 / 18	96 / 645 / 5 / 11	98 / 823 / 3 / 16	95 / 776 / 6 / 6
Wahono	59 / 1160 / 0 / 0	54 / 3455 / 6 / 331	58 / 3744 / 1 / 42	56 / 1696 / 4 / 16	56 / 2161 / 3 / 32	55 / 1694 / 4 / 17
Danijel	45 / 750 / 0 / 0	41 / 2705 / 5 / 261	44 / 2183 / 1 / 28	41 / 1136 / 3 / 11	42 / 1248 / 2 / 17	41 / 1171 / 3 / 12
K_all3	37 / 500 / 0 / 0	32 / 485 / 4 / 45	34 / 961 / 1 / 10	29 / 593 / 5 / 6	31 / 636 / 3 / 9	31 / 651 / 5 / 6

ER = 5%

	No Error	none	three	machine	machine2	machine3
Hall	102 / 490 / 0 / 0	98 / 1385 / 5 / 63	101 / 1128 / 1 / 4	99 / 635 / 2 / 3	100 / 725 / 1 / 3	99 / 679 / 2 / 1
Wahono	59 / 1160 / 0 / 0	57 / 1880 / 3 / 88	59 / 2913 / 0 / 9	58 / 1554 / 1 / 4	58 / 1651 / 1 / 6	58 / 1510 / 1 / 3
Danijel	45 / 750 / 0 / 0	43 / 1060 / 2 / 50	45 / 1755 / 0 / 5	43 / 983 / 2 / 2	44 / 1071 / 1 / 4	43 / 976 / 2 / 2
K_all3	37 / 500 / 0 / 0	33 / 430 / 1 / 20	35 / 997 / 0 / 2	34 / 606 / 3 / 1	34 / 588 / 1 / 3	34 / 602 / 2 / 1

ER = 2%

	No Error	none	three	machine	machine2	machine3
Hall	102 / 490 / 0 / 0	100 / 635 / 2 / 10	102 / 982 / 0 / 0	100 / 632 / 1 / 1	101 / 639 / 1 / 1	102 / 687 / 0 / 0
Wahono	59 / 1160 / 0 / 0	58 / 1595 / 1 / 34	59 / 2407 / 0 / 1	59 / 1387 / 0 / 1	59 / 1445 / 0 / 1	59 / 1460 / 0 / 1
Danijel	45 / 750 / 0 / 0	44 / 900 / 1 / 15	45 / 1552 / 0 / 1	45 / 948 / 0 / 0	45 / 986 / 0 / 1	45 / 952 / 0 / 0
K_all3	37 / 500 / 0 / 0	33 / 450 / 1 / 7	37 / 1009 / 0 / 0	35 / 573 / 1 / 0	35 / 594 / 0 / 0	37 / 585 / 0 / 0

ER = 0%

	none	three	machine	machine2	machine3
Hall	102 / 490 / 0 / 0	102 / 1000 / 0 / 0	102 / 672 / 0 / 0	102 / 682 / 0 / 0	102 / 683 / 0 / 0
Wahono	59 / 1150 / 0 / 0	59 / 2300 / 0 / 0	59 / 1409 / 0 / 0	59 / 1400 / 0 / 0	59 / 1409 / 0 / 0
Danijel	45 / 755 / 0 / 0	45 / 1500 / 0 / 0	45 / 915 / 0 / 0	45 / 925 / 0 / 0	45 / 911 / 0 / 0
K_all3	37 / 500 / 0 / 0	37 / 980 / 0 / 0	37 / 560 / 0 / 0	37 / 566 / 0 / 0	36 / 554 / 0 / 0

metrics

medians reported
precision / recall / cost

ER = 10%

	none	three	machine	machine2	machine3
Hall	0.26 / 0.9 / 2930	0.85 / 0.94 / 1696	0.89 / 0.89 / 722	0.87 / 0.92 / 855	0.92 / 0.92 / 862
Wahono	0.14 / 0.87 / 3315	0.55 / 0.94 / 3823	0.75 / 0.89 / 1721	0.64 / 0.91 / 2217	0.67 / 0.9 / 1919
Danijel	0.14 / 0.85 / 2535	0.61 / 0.94 / 2303	0.79 / 0.85 / 1033	0.7 / 0.88 / 1217	0.73 / 0.9 / 1209
K_all3	0.41 / 0.69 / 470	0.76 / 0.77 / 964	0.81 / 0.68 / 570	0.76 / 0.68 / 590	0.8 / 0.73 / 614

	none	three	machine	machine2	machine3
Hall	0.23 / 0.88 / 3290	0.85 / 0.93 / 1608	0.89 / 0.91 / 645	0.86 / 0.92 / 823	0.94 / 0.9 / 776
Wahono	0.14 / 0.87 / 3455	0.57 / 0.94 / 3744	0.77 / 0.9 / 1696	0.63 / 0.9 / 2161	0.77 / 0.9 / 1694
Danijel	0.14 / 0.86 / 2705	0.61 / 0.92 / 2183	0.78 / 0.85 / 1136	0.71 / 0.88 / 1248	0.78 / 0.85 / 1171
K_all3	0.41 / 0.73 / 485	0.77 / 0.77 / 961	0.84 / 0.66 / 593	0.76 / 0.7 / 636	0.84 / 0.7 / 651

ER = 5%

	none	three	machine	machine2	machine3
Hall	0.61 / 0.92 / 1385	0.96 / 0.95 / 1128	0.97 / 0.93 / 635	0.96 / 0.94 / 725	0.99 / 0.93 / 679
Wahono	0.39 / 0.92 / 1880	0.87 / 0.95 / 2913	0.94 / 0.94 / 1554	0.9 / 0.94 / 1651	0.95 / 0.94 / 1510
Danijel	0.46 / 0.9 / 1060	0.9 / 0.94 / 1755	0.96 / 0.9 / 983	0.92 / 0.92 / 1071	0.95 / 0.9 / 976
K_all3	0.63 / 0.75 / 430	0.94 / 0.8 / 997	0.97 / 0.77 / 606	0.92 / 0.77 / 588	0.97 / 0.77 / 602

ER = 2%

	none	three	machine	machine2	machine3
Hall	0.91 / 0.94 / 635	1.0 / 0.96 / 982	0.99 / 0.94 / 632	0.99 / 0.95 / 639	1.0 / 0.96 / 687
Wahono	0.63 / 0.94 / 1595	0.98 / 0.95 / 2407	0.98 / 0.95 / 1387	0.98 / 0.95 / 1445	0.98 / 0.95 / 1460
Danijel	0.75 / 0.92 / 900	0.98 / 0.94 / 1552	1.0 / 0.94 / 948	0.98 / 0.94 / 986	1.0 / 0.94 / 952
K_all3	0.82 / 0.75 / 450	1.0 / 0.84 / 1009	1.0 / 0.8 / 573	1.0 / 0.81 / 594	1.0 / 0.84 / 585

ER = 0%

	none	three	machine	machine2	machine3
Hall	1.0 / 0.96 / 490	1.0 / 0.96 / 1000	1.0 / 0.96 / 671	1.0 / 0.96 / 682	1.0 / 0.96 / 683
Wahono	1.0 / 0.95 / 1150	1.0 / 0.95 / 2300	1.0 / 0.95 / 1409	1.0 / 0.95 / 1390	1.0 / 0.95 / 1409
Danijel	1.0 / 0.94 / 755	1.0 / 0.94 / 1500	1.0 / 0.94 / 915	1.0 / 0.94 / 925	1.0 / 0.94 / 911
K_all3	1.0 / 0.85 / 500	1.0 / 0.84 / 980	1.0 / 0.84 / 556	1.0 / 0.84 / 563	1.0 / 0.83 / 557

ai-se / ML-assisted-SLR

Random Human Error #70

Error

Error Correction:

none:

three:

machine

machine

machine2

machine3

Results

metrics