ai-se / ML-assisted-SLR

Automated Systematic Literature Review
2 stars 2 forks source link

Transfer knowledge in SLR #42

Open azhe825 opened 7 years ago

azhe825 commented 7 years ago

Abstract

Systematic literature review (SLR) is the primary method for aggregating and synthesizing evidence in evidence-based software engineering. Such SLR studies need to be conducted frequently since a) researchers should update their SLR result once one or two years to consider latest publications; b) most researchers are constantly studying different research questions in the same or similar topic area. However, SLR studies cannot be conducted frequently due to its heavy cost. In our previous study, with the help of FASTREAD, we succeed to save 90% of the review cost in sacrifice of 10% recall in primary study selection of systematic literature review (SLR). In this paper, we allow researchers to import knowledge from previously completed SLR studies to boost FASTREAD. With the appropriate knowledge transfering technique, review cost and variances can be further reduced when updating an SLR or initiating an new SLR on similar or related topics.

Why

In our previous study, FASTREAD has effectively reduced the cost of primary study selection in SLR.

However, in FASTREAD, random sampling costs a large amount of review effort and introduces most of the variances as shown above. To further reduce the cost of primary study selection, random sampling step needs to be replaced.

How

External knowledge needs to be introduced in order to replace random sampling. There are certain scenarios that reviewers are guaranteed to have some knowledge on their SLRs. Such scenarios are when a reviewer has done an SLR using FASTREAD (or has access to all the data of other reviewers conducting an SLR with FASTREAD) and now

We call these two scenarios update SLR and transfer SLR respectively. In both of these scenarios, the knowledge of previously conducted SLR can be imported as external knowledge to boost the primary study selection of the new SLR.

The following of this paper will discuss the use of previous knowledge in such scenarios.

Background

Update SLR

Some literature review, existing SLR update examples, full update vs. snowballing...

Assumptions:

Transfer SLR

Some literature review, examples.

Assumptions:

Method

pdf

UPDATE is designed to transfer knowledge in update SLR scenario where

REUSE is designed to transfer knowledge in transfer SLR scenario where

Experiment

Update SLR

Partial UPDATE vs. Whole UPDATE: (can be a RQ).

Data: Hall2007- as previous SLR, Hall2007+ as new SLR:

FASTREAD on Hall2007-:

FASTREAD vs. Partial UPDATE vs. Whole UPDATE on Hall2007+:

Transfer SLR

Depending on the topic similarity of the new SLR and the previous one, different methods might be more suitable:

Topic similarity

Data sets:

Similarity measurement: 30 topics LDA, L2 normalization, cosine distance.

Data similarity

data_Hall_Wahono: 0.860254 data_Hall_Abdellatif: 0.726351 data_Abdellatif_Wahono: 0.809703

Target similarity

target_Hall_Wahono: 0.995255 target_Hall_Abdellatif: 0.64379 target_Abdellatif_Wahono: 0.649005

(Data similarity does not necessarily reflect target similarity)

Hall and Wahono are both on defect prediction, and these two have very high target similarity (0.995).

Abdellatif is on software analysis, the target similarity between Abdellatif and the other two are about 0.64.

Result

Hall as previous SLR,

Wahono as previous SLR,

Abdellatif as previous SLR,

Conclusions:

A series of experiments

FASTREAD on Hall2007- => UPDATE on Hall2007+ => UPDATE on Wahono => REUSE on Adbellatif:

azhe825 commented 7 years ago

50 topics: data_Hall_Wahono: 0.857870 data_Hall_Abdellatif: 0.754895 data_Abdellatif_Wahono: 0.901016 target_Hall_Wahono: 0.994922 target_Hall_Abdellatif: 0.425869 target_Abdellatif_Wahono: 0.368009

timm commented 7 years ago

Systematic literature review (SLR) is the primary method for aggregating and synthesizing evidence in evidence-based software engineering.

Performing an initial SLR on a new area can be very slow and expensive due to the cost of the primary selection study (where researchers review thousands of papers to find the dozens of relevant papers). Further, all that effort has to be repeated if ever that SLR is to be updated.

We find that the effort required to update an initial primary selection study can be significantly reduced via a automatic text mining. For initial primary selections, FASTREAD uses support vector machines to build a model of relevant and irrelevant papers seen to do. By incrementally querying and updating the support vectors of that SVM it is possible to greatly reduce the number of papers that humans have to read (by as much as 97%) during that initial primary selection study.

This paper checks if FASTREAD can be used to reduce the effort required to update initial primary selection studies with a new set of papers. In this work, we assess a {\em naive update policy}; i.e. the new SVM is initialized with the support vectors found in the prior study. Its turns out that this naive update policy does not always work. Rather, we recommend (a)~two different update methods and (b)~deciding which method to use based on the distance of the old selection study to the new study. In the experiments of this paper, we found that the effort associated with these primary selection studies can be reduced by our two methods by up too XXXX.

============

i got confused by all the final charts. which i think are trying to say that

  1. you can look at the corpus and auto detect similarity and difference.
  2. which means, in turn, you can select which transfer method to apply
  3. and i think u are also saying that if you use the opposite of your recommendations that things get sub-optimizal

which sounds like 3 RQs:

Missing

research questions

related work:

future work

Why

good that you did not walk thur 16 different options for fast read.

How

never “he” but “they”

azhe825 commented 7 years ago
  1. I have not solved the problem of "decide which method to apply by assessing the corpus".

    • Why: target similarity is not available before review.
    • Now: the suggestion for now is
    • to update an SLR, use UPDATE method.
    • to start a new SLR, if same topic as previous one, use UPDATE, like Hall to Wahono; if similar topic, use REUSE, like Hall to Abdellatif.
  2. Found some shortcomings of both UPDATE and REUSE:

    • UPDATE: works perfectly on updating SLR with no concept drift. However, even though Hall and Wahono have very high target similarity, using UPDATE tends to retrieve less on the latter one. This effect is more significant as the target similarity goes lower.
    • REUSE: not as good as UPDATE in the early stage. Outperforms UPDATE later, because of the effect mentioned earlier.
    • possible solution: time decaying model which can have both methods' merits. Motivation: UPDATE is better than REUSE in any scenario if we solve the concept drift problem. Time decaying model throws away older positive examples to tackle concept drift. This can be a journal extension.
timm commented 7 years ago

Why: target similarity is not available before review.

cant you lda the corpuses and report delta between the topics?

azhe825 commented 7 years ago

Data similarity is available before review while target similarity is not.

Target similarity is data similarity of only the relevant docs. Before review, we don't know which are relevant.

timm commented 7 years ago

Data similarity is available before review while target similarity is not. Target similarity is data similarity of only the relevant docs. Before review, we don't know which are relevant.

agreed.what are the data similarities between (say) the first 1000 docs (picked at random) from the queries of your 3 corpuses?

azhe825 commented 7 years ago

I compared the data similarity of all docs between the 3 corpuses:

Similarity measurement: 30 topics LDA, L2 normalization, cosine distance.

Data similarity

data_Hall_Wahono: 0.860254 data_Hall_Abdellatif: 0.726351 data_Abdellatif_Wahono: 0.809703

Don't think data similarity can accurately reflect target similarity.

timm commented 7 years ago

why are you doing 30 topics? you doing amrit's DE thing to find stable topics? without that, order effects on input could muddle your findings

azhe825 commented 7 years ago

Don't want to vary topic number on different corpuses. Otherwise it is not possible to compare the cosine distance (of lda results with different topic numbers). But i will try tuning the alpha and beta of LDA.

timm commented 7 years ago

been thinking a little about the paper. our main point is that reusing projects from review[i] greatly simplifies review[i+1].

there are some technical choices for review[i+1] that, as yet, we cannot pre-specify which is best.

so lets not confuse current results with future work. can we focus on one of those technical choices as a way to do review[i+1] and mention the other as part of future work?

azhe825 commented 7 years ago

I am afraid offering only one method will cause the content of this paper to be too few.

Therefore old plan:

New plan?

azhe825 commented 7 years ago

Anyway, I can draft a paper discussing UPDATE only and lets see how long it will be.