Open azhe825 opened 7 years ago
50 topics: data_Hall_Wahono: 0.857870 data_Hall_Abdellatif: 0.754895 data_Abdellatif_Wahono: 0.901016 target_Hall_Wahono: 0.994922 target_Hall_Abdellatif: 0.425869 target_Abdellatif_Wahono: 0.368009
Systematic literature review (SLR) is the primary method for aggregating and synthesizing evidence in evidence-based software engineering.
Performing an initial SLR on a new area can be very slow and expensive due to the cost of the primary selection study (where researchers review thousands of papers to find the dozens of relevant papers). Further, all that effort has to be repeated if ever that SLR is to be updated.
We find that the effort required to update an initial primary selection study can be significantly reduced via a automatic text mining. For initial primary selections, FASTREAD uses support vector machines to build a model of relevant and irrelevant papers seen to do. By incrementally querying and updating the support vectors of that SVM it is possible to greatly reduce the number of papers that humans have to read (by as much as 97%) during that initial primary selection study.
This paper checks if FASTREAD can be used to reduce the effort required to update initial primary selection studies with a new set of papers. In this work, we assess a {\em naive update policy}; i.e. the new SVM is initialized with the support vectors found in the prior study. Its turns out that this naive update policy does not always work. Rather, we recommend (a)~two different update methods and (b)~deciding which method to use based on the distance of the old selection study to the new study. In the experiments of this paper, we found that the effort associated with these primary selection studies can be reduced by our two methods by up too XXXX.
============
i got confused by all the final charts. which i think are trying to say that
which sounds like 3 RQs:
RQ4: do all transfer methods work equally well on all transfers? (pint3, above)
RQ5: is it possible to decide which transfer method should be applied to which data set?
RQ6: finally, once our recommended method is applied to the actual data, how much is easier is updating an SLR versus doing one from scratch
research questions
related work:
future work
good that you did not walk thur 16 different options for fast read.
never “he” but “they”
I have not solved the problem of "decide which method to apply by assessing the corpus".
Found some shortcomings of both UPDATE and REUSE:
Why: target similarity is not available before review.
cant you lda the corpuses and report delta between the topics?
Data similarity is available before review while target similarity is not.
Target similarity is data similarity of only the relevant docs. Before review, we don't know which are relevant.
Data similarity is available before review while target similarity is not. Target similarity is data similarity of only the relevant docs. Before review, we don't know which are relevant.
agreed.what are the data similarities between (say) the first 1000 docs (picked at random) from the queries of your 3 corpuses?
I compared the data similarity of all docs between the 3 corpuses:
Similarity measurement: 30 topics LDA, L2 normalization, cosine distance.
Data similarity
data_Hall_Wahono: 0.860254 data_Hall_Abdellatif: 0.726351 data_Abdellatif_Wahono: 0.809703
Don't think data similarity can accurately reflect target similarity.
why are you doing 30 topics? you doing amrit's DE thing to find stable topics? without that, order effects on input could muddle your findings
Don't want to vary topic number on different corpuses. Otherwise it is not possible to compare the cosine distance (of lda results with different topic numbers). But i will try tuning the alpha and beta of LDA.
been thinking a little about the paper. our main point is that reusing projects from review[i] greatly simplifies review[i+1].
there are some technical choices for review[i+1] that, as yet, we cannot pre-specify which is best.
so lets not confuse current results with future work. can we focus on one of those technical choices as a way to do review[i+1] and mention the other as part of future work?
I am afraid offering only one method will cause the content of this paper to be too few.
Therefore old plan:
This paper:
dealing with two scenarios: update an SLR and initiate a new SLR with similar topic.
offering two methods, UPDATE and REUSE. For update scenario, choose UPDATE; for the other scenario, UPDATE or REUSE can save effort depending on target similarity (yet need human decision to choose which method).
extension (future paper):
use a single method called time decaying model to replace the two. Works well on either scenario (have some preliminary result on this). Also, this new method has value even in a single large SLR, effective to deal with concept drift problem. Therefore can replace FASTREAD possibly.
New plan?
Anyway, I can draft a paper discussing UPDATE only and lets see how long it will be.
Abstract
Systematic literature review (SLR) is the primary method for aggregating and synthesizing evidence in evidence-based software engineering. Such SLR studies need to be conducted frequently since a) researchers should update their SLR result once one or two years to consider latest publications; b) most researchers are constantly studying different research questions in the same or similar topic area. However, SLR studies cannot be conducted frequently due to its heavy cost. In our previous study, with the help of FASTREAD, we succeed to save 90% of the review cost in sacrifice of 10% recall in primary study selection of systematic literature review (SLR). In this paper, we allow researchers to import knowledge from previously completed SLR studies to boost FASTREAD. With the appropriate knowledge transfering technique, review cost and variances can be further reduced when updating an SLR or initiating an new SLR on similar or related topics.
Why
In our previous study, FASTREAD has effectively reduced the cost of primary study selection in SLR.
However, in FASTREAD, random sampling costs a large amount of review effort and introduces most of the variances as shown above. To further reduce the cost of primary study selection, random sampling step needs to be replaced.
How
External knowledge needs to be introduced in order to replace random sampling. There are certain scenarios that reviewers are guaranteed to have some knowledge on their SLRs. Such scenarios are when a reviewer has done an SLR using FASTREAD (or has access to all the data of other reviewers conducting an SLR with FASTREAD) and now
We call these two scenarios update SLR and transfer SLR respectively. In both of these scenarios, the knowledge of previously conducted SLR can be imported as external knowledge to boost the primary study selection of the new SLR.
The following of this paper will discuss the use of previous knowledge in such scenarios.
Background
Update SLR
Some literature review, existing SLR update examples, full update vs. snowballing...
Assumptions:
Transfer SLR
Some literature review, examples.
Assumptions:
Method
UPDATE is designed to transfer knowledge in update SLR scenario where
REUSE is designed to transfer knowledge in transfer SLR scenario where
Experiment
Update SLR
Partial UPDATE vs. Whole UPDATE: (can be a RQ).
Data: Hall2007- as previous SLR, Hall2007+ as new SLR:
FASTREAD on Hall2007-:![](https://github.com/ai-se/ML-assisted-SLR/blob/master/no_ES/figure/everything1.png?raw=yes)
FASTREAD vs. Partial UPDATE vs. Whole UPDATE on Hall2007+:![](https://github.com/ai-se/ML-assisted-SLR/blob/master/no_ES/figure/everything2.png?raw=yes)
Transfer SLR
Depending on the topic similarity of the new SLR and the previous one, different methods might be more suitable:
Topic similarity
Data sets:
Similarity measurement: 30 topics LDA, L2 normalization, cosine distance.
Data similarity
data_Hall_Wahono: 0.860254 data_Hall_Abdellatif: 0.726351 data_Abdellatif_Wahono: 0.809703
Target similarity
target_Hall_Wahono: 0.995255 target_Hall_Abdellatif: 0.64379 target_Abdellatif_Wahono: 0.649005
(Data similarity does not necessarily reflect target similarity)
Hall and Wahono are both on defect prediction, and these two have very high target similarity (0.995).
Abdellatif is on software analysis, the target similarity between Abdellatif and the other two are about 0.64.
Result
Hall as previous SLR,
on Wahono:
![](https://github.com/ai-se/ML-assisted-SLR/blob/master/no_ES/figure/Hall_Wahono0.png?raw=yes)
on Abdellatif:
![](https://github.com/ai-se/ML-assisted-SLR/blob/master/no_ES/figure/Hall_Abdellatif0.png?raw=yes)
Wahono as previous SLR,
on Hall:
![](https://github.com/ai-se/ML-assisted-SLR/blob/master/no_ES/figure/Wahono_Hall0.png?raw=yes)
on Abdellatif:
![](https://github.com/ai-se/ML-assisted-SLR/blob/master/no_ES/figure/Wahono_Abdellatif0.png?raw=yes)
Abdellatif as previous SLR,
on Hall:
![](https://github.com/ai-se/ML-assisted-SLR/blob/master/no_ES/figure/Abdellatif_Hall0.png?raw=yes)
on Wahono:
![](https://github.com/ai-se/ML-assisted-SLR/blob/master/no_ES/figure/Abdellatif_Wahono0.png?raw=yes)
Conclusions:
A series of experiments
FASTREAD on Hall2007- => UPDATE on Hall2007+ => UPDATE on Wahono => REUSE on Adbellatif: