New Figure (Sep 1st) - Githubissues

azhe825 commented 7 years ago

Hall Result

Hall, Tracy, Sarah Beecham, David Bowes, David Gray, and Steve Counsell. "A Systematic Literature Review on Fault Prediction Performance in Software Engineering."

	Hall Paper	IEEExplore
Initial Size	2073	8912
Final Size	136	106

Wahono Result

Wahono, Romi Satria. "A systematic literature review of software defect prediction: research trends, datasets, methods and frameworks." Journal of Software Engineering 1, no. 1 (2015): 1-16.

	Wahono Paper	IEEExplore
Initial Size	2117	7002
Final Size	71	62

Comparisons for each code:

Start with patient active learning (P_U_S_A) First compare the last code P_U_S_A vs. P_U_S_N P_U_S_A wins. For last code, A is better than N (aggressive undersampling is useful)

Compare the third code P_U_S_A vs. P_U_C_A No clear winner. I would prefer C over S, since continuous learning can handle concept drift better. But let's keep both

Compare the second code P_U_S_A vs. P_U_C_A vs. P_C_C_A No clear winner. I prefer C over U since no need to worry about stop rule for U (margin threshold of SVM) But let's keep all three

Compare the first code P_U_S_A vs. P_U_C_A vs. P_C_C_A vs. H_U_S_A vs. H_U_C_A vs. H_C_C_A H is better than P. But H_C_C_A is a clear loser. Start aggressive undersampling with only one "relevant" example is a bad idea. The final winner would be H_U_S_A and H_U_C_A. I would prefer H_U_C_A for continuous learning to handle concept drift and updating of SLR.

Email SLR authors

How much effort does it cost for a primary study selection? (N reviewers, T time for each)

Better if details of each step can be provided:
- Exclusion by title and abstract: start from once the initial candidate study list has been collected. end with exclusion by title and abstract.
- Exclusion by full text: start from candidate study list included by title and abstract. end with exclusion by full text.
- Missing studies inclusion: start from final list included by full text. end with inclusion of studies by author, reference...
Is there any effort taken (a hidden step) between applying the search string to databases and the initial candidate study list has been collected?

Why ask: I retrieve much more candidate studies with the same search string provided in the SLR paper.

If there is any effort, what is that? How much does it cost? What is the reason behind?

One reason I guess is to reduce the size of initial candidate study list, thus reduce the review cost of primary study selection. If this is true, learning based primary study selection can remove this hidden step since it can search in much larger candidate study list and retrieve above 90% "relevant" with less effort. This may even improve the overall completeness and save the effort of this hidden step.

azhe825 commented 7 years ago

Grayscale version:

Not good.

nkraft commented 7 years ago

Yeah, that is unreadable. Possible to use different line styles?

azhe825 commented 7 years ago

13 lines are so hard to display in a single graph, not to say we have actually 13 median lines + 13 iqr lines. Planning to shelf this problem until the first draft of this paper is finished.

timm commented 7 years ago

seperate into three groups, ABC

show all the IQRs together on a seperate graph

azhe825 commented 7 years ago

Within each group, one graph.

Then how do I compare the performance of each group?

timm commented 7 years ago

they will all have the same x-axis

they will be shown in the paper one under each other.

azhe825 commented 7 years ago

Got it. The first draft of "How to read less" is ready on sharelatex. Will work on the figures soon.

ai-se / ML-assisted-SLR

New Figure (Sep 1st) #23

Hall Result

Wahono Result

Comparisons for each code:

Email SLR authors