Closed azhe825 closed 7 years ago
am i reading this right?
how does this comment on active learning + text mining in SE? if you did one case study from that domain, we can do an se pub
What I want to do for the paper is to borrow methods from biomedical and litigation, combine them to be a better one, apply to SLR in software engineering. The combined method itself can also beat the state-of-art in both biomedical and litigation, if possible.
I am collecting all the efforts to facilitate Primary Study Selection in SLR, SE (referred to as Citation Screening in systematic review, biomedical engineering, and TAR in e-discovery, litigation).
What I found is that:
my bad. i get it now. can u calibrated x-axis for me? how many documents is x=1
what about using the hall12 set? all the references marked in [[double bracket]] in "Tracy Hall, Sarah Beecham, David Bowes, David Gray, Steve Counsell: A Systematic Literature Review on Fault Prediction Performance in Software Engineering. IEEE Trans. Software Eng. 38(6): 1276-1304 (2012)"
if you had #13 and hall, that would be a powerful paper
then is y-axis precision? and do you get what i saying about how it could be used as an early stopping criteria?
pause
nope. i'm wrong there. the dotted lines come from 10 repeats. in practice, humans would only do 1 repeat
so i'd be really interested if there is anything like a "standard" active learning method in SE. i'm suspecting "no". which leaves the field wide open for your expert input
recommendation: get the hall results then start writing a journal paper for IST. and get all that done before the semester project load gets nasty. i.e. by early to mid sept
and can i get a 1-2 line summary of all the sampling methods? eg. patient_aggressive_undersampling
3 important components of each method:
my bad. i get it now. can u calibrated x-axis for me? how many documents is x=1
what about using the hall12 set? all the references marked in [[double bracket]] in "Tracy Hall, Sarah Beecham, David Bowes, David Gray, Steve Counsell: A Systematic Literature Review on Fault Prediction Performance in Software Engineering. IEEE Trans. Software Eng. 38(6): 1276-1304 (2012)"
then is y-axis precision? and do you get what i saying about how it could be used as an early stopping criteria?
pause
nope. i'm wrong there. the dotted lines come from 10 repeats. in practice, humans would only do 1 repeat
so i'd be really interested if there is anything like a "standard" active learning method in SE. i'm suspecting "no". which leaves the field wide open for your expert input
As far as I know, no active learning method applied to lit review in SE.
then you'll be the first
ping me in 2 days time. i've overdosed on facebook posts today but i could ask a question on fbook to the se crowd if they know of any
Baseline Results and possible improvements
In the scenario of
Baseline from Biomedical #17
Baseline from Litigation #16
Conclusions drawn:
Two current winners:
Ten repeat result
Future work
Get more data sets to do experiment on. It would be best if one from biomedical, one from litigation.