ai-se / ML-assisted-SLR

Automated Systematic Literature Review
2 stars 2 forks source link

Aug-04-2016 #18

Closed azhe825 closed 7 years ago

azhe825 commented 7 years ago

Baseline Results and possible improvements

In the scenario of

  1. fixed pool
  2. no external knowledge from experts
  3. labels are all correct

Baseline from Biomedical #17

Baseline from Litigation #16

Conclusions drawn:

Two current winners:

Future work

Get more data sets to do experiment on. It would be best if one from biomedical, one from litigation.

timm commented 7 years ago

am i reading this right?

how does this comment on active learning + text mining in SE? if you did one case study from that domain, we can do an se pub

azhe825 commented 7 years ago

What I want to do for the paper is to borrow methods from biomedical and litigation, combine them to be a better one, apply to SLR in software engineering. The combined method itself can also beat the state-of-art in both biomedical and litigation, if possible.

I am collecting all the efforts to facilitate Primary Study Selection in SLR, SE (referred to as Citation Screening in systematic review, biomedical engineering, and TAR in e-discovery, litigation).

What I found is that:

  1. In Software Engineering community, most efforts are building tools to manage entire SLR process. One study uses Visual text mining (VTM), an unsupervised method to reduce cost of primary study selection. No active learning found. If this is true, it is kind of a blank spot in SLR.
  2. In Biomedical Engineering, they start systematic review years before Software Engineering community. And machine learning applied to assist citation screening start from 2006. At 2010, Byron C. Wallace published his first attempt to apply active learning to assist citation screening, see #17. His method is the patient_aggressive_undersampling in our figure. He continues to explore citation screening by crowd-sourcing, multiple-experts... But the 2010 method is most suitable for a baseline here. Besides Byron C. Wallace, very few studies have been found on the topic (I found one using supervised learning, no comparison with any baseline, results are not good).
  3. In e-discovery, the state-of-art would be hasty_continuous_active in our figure from #16. They have not been compared with Byron C. Wallace's work yet.
timm commented 7 years ago
azhe825 commented 7 years ago
timm commented 7 years ago

my bad. i get it now. can u calibrated x-axis for me? how many documents is x=1

what about using the hall12 set? all the references marked in [[double bracket]] in "Tracy Hall, Sarah Beecham, David Bowes, David Gray, Steve Counsell: A Systematic Literature Review on Fault Prediction Performance in Software Engineering. IEEE Trans. Software Eng. 38(6): 1276-1304 (2012)"

if you had #13 and hall, that would be a powerful paper

then is y-axis precision? and do you get what i saying about how it could be used as an early stopping criteria?

pause

nope. i'm wrong there. the dotted lines come from 10 repeats. in practice, humans would only do 1 repeat

so i'd be really interested if there is anything like a "standard" active learning method in SE. i'm suspecting "no". which leaves the field wide open for your expert input

recommendation: get the hall results then start writing a journal paper for IST. and get all that done before the semester project load gets nasty. i.e. by early to mid sept

timm commented 7 years ago

and can i get a 1-2 line summary of all the sampling methods? eg. patient_aggressive_undersampling

azhe825 commented 7 years ago

3 important components of each method:

azhe825 commented 7 years ago

my bad. i get it now. can u calibrated x-axis for me? how many documents is x=1

what about using the hall12 set? all the references marked in [[double bracket]] in "Tracy Hall, Sarah Beecham, David Bowes, David Gray, Steve Counsell: A Systematic Literature Review on Fault Prediction Performance in Software Engineering. IEEE Trans. Software Eng. 38(6): 1276-1304 (2012)"

then is y-axis precision? and do you get what i saying about how it could be used as an early stopping criteria?

pause

nope. i'm wrong there. the dotted lines come from 10 repeats. in practice, humans would only do 1 repeat

so i'd be really interested if there is anything like a "standard" active learning method in SE. i'm suspecting "no". which leaves the field wide open for your expert input

timm commented 7 years ago

As far as I know, no active learning method applied to lit review in SE.

then you'll be the first

timm commented 7 years ago

ping me in 2 days time. i've overdosed on facebook posts today but i could ask a question on fbook to the se crowd if they know of any