Systematic literature review (SLR) is the primary method for aggregating and synthesizing evidence in evidence-based software engineering. Such SLR studies need to be conducted frequently since a) researchers should update their SLR result once one or two years to consider latest publications; b) most researchers are constantly studying different research questions in the same or similar topic area. However, SLR studies cannot be conducted frequently due to its heavy cost. In our previous study, with the help of FASTREAD, we succeed to save 90% of the review cost in sacrifice of 10% recall in primary study selection of systematic literature review (SLR). In this paper, we allow researchers to import knowledge from previously completed SLR studies to boost FASTREAD. With knowledge transfering, review effort can be further reduced to 50% of the review effort of FASTREAD with extremely low variance when updating an SLR study while variance can be greatly reduced in the scenario of conducting an SLR on similar or the same topic.
Assumptions
(same as FASTREAD paper)
one single reviewer who never makes mistakes
no expert knowledge for building up initial seed training set
binary classification, studies will be labeled as "relevant" or "irrelevant" by the reviewer.
UPDATE scenario
(Except for the general assumptions)
previously completed an SLR which
is of the same topic
has same or similar review protocols
import the labeled examples from previous SLR to boost current primary study selection
REUSE scenario
(Except for the general assumptions)
previously completed an SLR which
is of similar topic
import the trained model from previous SLR to boost current primary study selection
Methods
Experiments and results (in progress)
[Hall2007-] -> [Hall2007+] -> [Wahono]
FASTREAD on [Hall2007-]
FASTREAD vs UPDATE on [Hall2007+]
FASTREAD vs UPDATE vs REUSE on [Wahono]
Conclusion from current result
UPDATE can save about 50% review effort comparing to a fresh start with FASTREAD.
REUSE will not save much review effort, but can greatly reduce variance of FASTREAD.
To do
Literature review for background
[ ] SLR update
[ ] SLR on similar topics (reuse scenario)
Generate synthetic data
[ ] from LDA topics (or just term frequency)
[ ] to test performance for different target overlap score (L2, cosine distance)
[ ] find threshold for using UPDATE, using REUSE, using START
Extract data from SLR
[x] Now we have data in the area of: similarity>= UPDATE
[ ] Data lies in the area of: REUSE<= similarity< UPDATE (defect analysis perhaps)
Abstract
Systematic literature review (SLR) is the primary method for aggregating and synthesizing evidence in evidence-based software engineering. Such SLR studies need to be conducted frequently since a) researchers should update their SLR result once one or two years to consider latest publications; b) most researchers are constantly studying different research questions in the same or similar topic area. However, SLR studies cannot be conducted frequently due to its heavy cost. In our previous study, with the help of FASTREAD, we succeed to save 90% of the review cost in sacrifice of 10% recall in primary study selection of systematic literature review (SLR). In this paper, we allow researchers to import knowledge from previously completed SLR studies to boost FASTREAD. With knowledge transfering, review effort can be further reduced to 50% of the review effort of FASTREAD with extremely low variance when updating an SLR study while variance can be greatly reduced in the scenario of conducting an SLR on similar or the same topic.
Assumptions
(same as FASTREAD paper)
UPDATE scenario
(Except for the general assumptions)
previously completed an SLR which
import the labeled examples from previous SLR to boost current primary study selection
REUSE scenario
(Except for the general assumptions)
previously completed an SLR which
import the trained model from previous SLR to boost current primary study selection
Methods
Experiments and results (in progress)
[Hall2007-] -> [Hall2007+] -> [Wahono]
FASTREAD on [Hall2007-]
FASTREAD vs UPDATE on [Hall2007+]
FASTREAD vs UPDATE vs REUSE on [Wahono]
Conclusion from current result
To do
Literature review for background
Generate synthetic data
Extract data from SLR