Transfer knowledge in SLR

azhe825 commented 7 years ago

Abstract

Systematic literature review (SLR) is the primary method for aggregating and synthesizing evidence in evidence-based software engineering. Such SLR studies need to be conducted frequently since a) researchers should update their SLR result once one or two years to consider latest publications; b) most researchers are constantly studying different research questions in the same or similar topic area. However, SLR studies cannot be conducted frequently due to its heavy cost. In our previous study, with the help of FASTREAD, we succeed to save 90% of the review cost in sacrifice of 10% recall in primary study selection of systematic literature review (SLR). In this paper, we allow researchers to import knowledge from previously completed SLR studies to boost FASTREAD. With the appropriate knowledge transfering technique, review cost and variances can be further reduced when updating an SLR or initiating an new SLR on similar or related topics.

Why

In our previous study, FASTREAD has effectively reduced the cost of primary study selection in SLR.

However, in FASTREAD, random sampling costs a large amount of review effort and introduces most of the variances as shown above. To further reduce the cost of primary study selection, random sampling step needs to be replaced.

How

External knowledge needs to be introduced in order to replace random sampling. There are certain scenarios that reviewers are guaranteed to have some knowledge on their SLRs. Such scenarios are when a reviewer has done an SLR using FASTREAD (or has access to all the data of other reviewers conducting an SLR with FASTREAD) and now

he wants to update the SLR result with newly published primary studies;
he wants to initiate a new SLR on topics similar to or related to the previous one.

We call these two scenarios update SLR and transfer SLR respectively. In both of these scenarios, the knowledge of previously conducted SLR can be imported as external knowledge to boost the primary study selection of the new SLR.

The following of this paper will discuss the use of previous knowledge in such scenarios.

Background

Update SLR

Some literature review, existing SLR update examples, full update vs. snowballing...

Assumptions:

one single reviewer who never makes mistakes
binary classification, studies will be labeled as "relevant" or "irrelevant" by the reviewer
previously completed an SLR which
- is of the same topic
- has same or similar review protocols

Transfer SLR

Some literature review, examples.

Assumptions:

one single reviewer who never makes mistakes
binary classification, studies will be labeled as "relevant" or "irrelevant" by the reviewer
previously completed an SLR which
- is of similar or related topics.

Method

pdf

UPDATE is designed to transfer knowledge in update SLR scenario where

Whole UPDATE: all data from previous SLR are imported, including labeled and unlabeled;
Partial UPDATE: only labeled data from previous SLR are imported.
Skip random sampling step.

REUSE is designed to transfer knowledge in transfer SLR scenario where

only the learned model is imported from previous SLR;
reuse sampling is utilized to replace random sampling;
after ENOUGH examples retrieved by reuse sampling, train a new model with only examples from current SLR.

Experiment

Update SLR

Partial UPDATE vs. Whole UPDATE: (can be a RQ).

Data: Hall2007- as previous SLR, Hall2007+ as new SLR:

FASTREAD on Hall2007-:

FASTREAD vs. Partial UPDATE vs. Whole UPDATE on Hall2007+:

UPDATE reduces review cost and variances on Hall2007+.
Partial UPDATE has same performance with Whole UPDATE while less data are imported.

Transfer SLR

Depending on the topic similarity of the new SLR and the previous one, different methods might be more suitable:

high: UPDATE may work best;
medium: REUSE may work best;
low: FASTREAD may work best;

Topic similarity

Data sets:

Hall: defect prediction
Wahono: defect prediction
Abdellatif: software analysis

Similarity measurement: 30 topics LDA, L2 normalization, cosine distance.

Data similarity

data_Hall_Wahono: 0.860254 data_Hall_Abdellatif: 0.726351 data_Abdellatif_Wahono: 0.809703

Target similarity

target_Hall_Wahono: 0.995255 target_Hall_Abdellatif: 0.64379 target_Abdellatif_Wahono: 0.649005

(Data similarity does not necessarily reflect target similarity)

Hall and Wahono are both on defect prediction, and these two have very high target similarity (0.995).

Abdellatif is on software analysis, the target similarity between Abdellatif and the other two are about 0.64.

Result

Hall as previous SLR,

on Wahono:
on Abdellatif:

Wahono as previous SLR,

on Hall:
on Abdellatif:

Abdellatif as previous SLR,

on Hall:
on Wahono:

Conclusions:

mostly consistent with Similarity(Hall, Wahono)=high, Similarity(Hall, Abdellatif)=medium, Similarity(Abdellatif, Wahono)=medium.
from Wahono to Hall: REUSE is better than UPDATE when retrieving 90% relevant studies while UPDATE is better than REUSE when retrieving less than 80%. Using UPDATE on transfer SLR scenarios, even when target similarity is high, may sacrifice completeness.

A series of experiments

FASTREAD on Hall2007- => UPDATE on Hall2007+ => UPDATE on Wahono => REUSE on Adbellatif:

azhe825 commented 7 years ago

50 topics: data_Hall_Wahono: 0.857870 data_Hall_Abdellatif: 0.754895 data_Abdellatif_Wahono: 0.901016 target_Hall_Wahono: 0.994922 target_Hall_Abdellatif: 0.425869 target_Abdellatif_Wahono: 0.368009

timm commented 7 years ago

Systematic literature review (SLR) is the primary method for aggregating and synthesizing evidence in evidence-based software engineering.

Performing an initial SLR on a new area can be very slow and expensive due to the cost of the primary selection study (where researchers review thousands of papers to find the dozens of relevant papers). Further, all that effort has to be repeated if ever that SLR is to be updated.

We find that the effort required to update an initial primary selection study can be significantly reduced via a automatic text mining. For initial primary selections, FASTREAD uses support vector machines to build a model of relevant and irrelevant papers seen to do. By incrementally querying and updating the support vectors of that SVM it is possible to greatly reduce the number of papers that humans have to read (by as much as 97%) during that initial primary selection study.

This paper checks if FASTREAD can be used to reduce the effort required to update initial primary selection studies with a new set of papers. In this work, we assess a {\em naive update policy}; i.e. the new SVM is initialized with the support vectors found in the prior study. Its turns out that this naive update policy does not always work. Rather, we recommend (a)~two different update methods and (b)~deciding which method to use based on the distance of the old selection study to the new study. In the experiments of this paper, we found that the effort associated with these primary selection studies can be reduced by our two methods by up too XXXX.

============

i got confused by all the final charts. which i think are trying to say that

you can look at the corpus and auto detect similarity and difference.
which means, in turn, you can select which transfer method to apply
and i think u are also saying that if you use the opposite of your recommendations that things get sub-optimizal

which sounds like 3 RQs:

RQ4: do all transfer methods work equally well on all transfers? (pint3, above)
RQ5: is it possible to decide which transfer method should be applied to which data set?
RQ6: finally, once our recommended method is applied to the actual data, how much is easier is updating an SLR versus doing one from scratch

Missing

research questions

can you introduce the idea of similarity intuitively as part of some initial research questions?

related work:

need to introduce snowballing since that is a term others use
transfer learning: for a cheats guide to transfer learning, see http://unbox.org/things/var/timm/13/transfer/ (just read sections 2 and 4) user=password=guest. feel free to take from http://unbox.org/things/var/timm/13/transfer/transferlit-v10.tex. note that this paper never got accepted anywhere

future work

paper 4: impact of mistakes

Why

good that you did not walk thur 16 different options for fast read.

How

never “he” but “they”

azhe825 commented 7 years ago

I have not solved the problem of "decide which method to apply by assessing the corpus".
- Why: target similarity is not available before review.
- Now: the suggestion for now is
- to update an SLR, use UPDATE method.
- to start a new SLR, if same topic as previous one, use UPDATE, like Hall to Wahono; if similar topic, use REUSE, like Hall to Abdellatif.
Found some shortcomings of both UPDATE and REUSE:
- UPDATE: works perfectly on updating SLR with no concept drift. However, even though Hall and Wahono have very high target similarity, using UPDATE tends to retrieve less on the latter one. This effect is more significant as the target similarity goes lower.
- REUSE: not as good as UPDATE in the early stage. Outperforms UPDATE later, because of the effect mentioned earlier.
- possible solution: time decaying model which can have both methods' merits. Motivation: UPDATE is better than REUSE in any scenario if we solve the concept drift problem. Time decaying model throws away older positive examples to tackle concept drift. This can be a journal extension.

timm commented 7 years ago

Why: target similarity is not available before review.

cant you lda the corpuses and report delta between the topics?

azhe825 commented 7 years ago

Data similarity is available before review while target similarity is not.

Target similarity is data similarity of only the relevant docs. Before review, we don't know which are relevant.

timm commented 7 years ago

Data similarity is available before review while target similarity is not. Target similarity is data similarity of only the relevant docs. Before review, we don't know which are relevant.

agreed.what are the data similarities between (say) the first 1000 docs (picked at random) from the queries of your 3 corpuses?

azhe825 commented 7 years ago

I compared the data similarity of all docs between the 3 corpuses:

Similarity measurement: 30 topics LDA, L2 normalization, cosine distance.

Data similarity

data_Hall_Wahono: 0.860254 data_Hall_Abdellatif: 0.726351 data_Abdellatif_Wahono: 0.809703

Don't think data similarity can accurately reflect target similarity.

timm commented 7 years ago

why are you doing 30 topics? you doing amrit's DE thing to find stable topics? without that, order effects on input could muddle your findings

azhe825 commented 7 years ago

Don't want to vary topic number on different corpuses. Otherwise it is not possible to compare the cosine distance (of lda results with different topic numbers). But i will try tuning the alpha and beta of LDA.

timm commented 7 years ago

been thinking a little about the paper. our main point is that reusing projects from review[i] greatly simplifies review[i+1].

there are some technical choices for review[i+1] that, as yet, we cannot pre-specify which is best.

so lets not confuse current results with future work. can we focus on one of those technical choices as a way to do review[i+1] and mention the other as part of future work?

azhe825 commented 7 years ago

I am afraid offering only one method will cause the content of this paper to be too few.

Therefore old plan:

This paper:
- dealing with two scenarios: update an SLR and initiate a new SLR with similar topic.
- offering two methods, UPDATE and REUSE. For update scenario, choose UPDATE; for the other scenario, UPDATE or REUSE can save effort depending on target similarity (yet need human decision to choose which method).
- extension (future paper):
- use a single method called time decaying model to replace the two. Works well on either scenario (have some preliminary result on this). Also, this new method has value even in a single large SLR, effective to deal with concept drift problem. Therefore can replace FASTREAD possibly.

New plan?

this paper: only update SLR scenario discussed, UPDATE method introduced.
extension: time decaying model for both scenario and concept drift.

azhe825 commented 7 years ago

Anyway, I can draft a paper discussing UPDATE only and lets see how long it will be.

ai-se / ML-assisted-SLR