Challenges of the problem

Problem

Broad and complete literature reviews are required when a) researchers are exploring a new area; b) researchers writing papers for peer review, to ensure reviewers will not reject a paper since it omits important related work; c) anyone surveying a field for latest developments. However, such literature reviews can be extremely labor intensive due to the low prevalence of relevant papers. Usually, thousands of papers where reviewed before revealing just a few dozen relevant papers. Therefore we design a tool called FASTREAD which applies active learning to minimize the review effort of human while still finding most of the relevant papers.

Challenges

How to find the best active learner for the target problem?
- via literature review, we found three different state-of-the-art active learners for similar problems across different domain. Which active learner is the best for our problem?
- by mix-and-matching the three state-of-the-art active learners, we generate 32 active learners and test all of those on our datasets. Then we pick the best one which outperforms all three state-of-the-art active learners.
How to apply domain knowledge to boost the review process?
- the review effort can vary wildly depending on the initial selection of the training examples.
- an initial pre-study (taking less than five minutes) can acquire the domain knowledge needed to dramatically reduce the variance in the review effort. The form of this pre-study is to rank the papers with their BM25 scores of a set of keywords (provided as domain knowledge).
When to stop the reviewing process?
- stopping too early will result in many missing relevant papers while stopping too late could waste review effort on irrelevant papers. A desired stopping point would be when 95% of the relevant papers have been found. However, since we do not know the total number of relevant papers to be found, there is no way to know whether we have found 95% of it.
- our solution to this challenge is to train a semi-supervised logistic regressor for estimating the number of relevant papers to be found. Then use this estimator to decide whether to stop.
How to correct human errors?
- actual human reviewers are fallible when labeling each paper as relevant or irrelevant. Researches show that it is reasonable to assume the precision and recall of a human reviewer are both 70%. When such human errors occur, how to correct the errors so that the active learner is not misled?
- our solution to this challenge is to ask the fallible human reviewer to recheck some of the labeled papers now and then. The checklist is selected from the papers whose label the active learner disagree most on.

ai-se / ML-assisted-SLR

Challenges of the problem #75

Problem

Challenges