timm commented 8 years ago

Need to know what happens when thresholds from N sources are applied to our data sets. ANd i need that written up and into the paper.

[x] The sonarQube ones
[x] What was that tool that harman used to assess his refactorings?
[x] this one seems to be gold. http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=5383377&newsearch=true&queryText=An%20Investigation%20of%20CK%20Metrics%20Thresholds&fname=&lname=&title=&volume=&issue=&spage= as you say, these authors agree: However, there are few studies that were conducted to formulate the guidelines, represented as threshold values, to interpret the complexity of the software design using metrics. but from this paper you CAN get thresholds.
[x] and this one https://www.thinkmind.org/download.php?articleid=softeng_2015_3_10_55070

BTW, Here'a a paper that does what we hate. menthions metrcs but not thresholds

http://www.pitt.edu/~ckemerer/DPD_Software_2005.pdf

rahlk commented 8 years ago

Hahaha, the images in issue #5 were from those papers! They have some awesome references which I'll use for our paper.

timm commented 8 years ago

what we need is you to run some quick select queries over our test data. do you see this chart and "SEESAW"? that was an old tool of mine. but see how it does better than N other standard things?

what we need is this chart for the defect data sets with "SEESAW" replaced with "RANK" (the name of our method, currently, in this paper)

rahlk commented 8 years ago

[X] Implement the techniques from the above 2 papers. DONE: Here are the thresholds for ANT

+---------+---------------------+-------------+
|         | VARL (Shatnawi '10) | Filó et al. |
+ Metrics +---------------------+-------------+
|         | Threshold | P-Value | Threshold   |
+---------+-----------+---------+-------------+
| CBO     | 1.78      | 0.000   | -           |
+---------+-----------+---------+-------------+
| MAX_CC  | 2.07      | 0.000   | -           |
+---------+-----------+---------+-------------+
| AVG_CC  | 0.86      | 0.003   | -           |
+---------+-----------+---------+-------------+
| LCOM    | 51        | 0.000   | 725         |
+---------+-----------+---------+-------------+
| LOC     | 171.59    | 0.000   | 30          |
+---------+-----------+---------+-------------+
| NOC     | -         | -       | 28          |
+---------+-----------+---------+-------------+
| CA      | -         | -       | 39          |
+---------+-----------+---------+-------------+
| CE      | -         | -       | 16          |
+---------+-----------+---------+-------------+
| DIT     | -         | -       | 4           |
+---------+-----------+---------+-------------+
| WMC     | -         | -       | 34          |
+---------+-----------+---------+-------------+

[ ] Compare against RANK

timm commented 8 years ago

Goof. Will need bibtex entries for all papers you usr

timm commented 8 years ago

Also... when these ranges are applied to the data, what effect do they have to the defect distribution?

rahlk commented 8 years ago

Working on that, I'll have the results this evening. On Mar 2, 2016 8:44 AM, "Tim Menzies" notifications@github.com wrote:

Also... when these ranges are applied to the data, what effect do they have to the defect distribution?

— Reply to this email directly or view it on GitHub https://github.com/ai-se/XTREE-FSE/issues/7#issuecomment-191243591.

timm commented 8 years ago

Nor that the home run would be that if Data set d, that RANK found good tteatments for, after division into good,bad (where bad = rows selected by threshold and good = all - bad) then the defect density is,about the same in good and bad (as witnessed by, say, box plots)

timm commented 8 years ago

what are the thresholds in the tool that harman used to assess his refactorings?

rahlk commented 8 years ago

Harman's refactoring tool thresholds. I'm looking in to this, comment as soon as I find it.

timm commented 8 years ago

when can i get results from applying those thresholds?

rahlk commented 8 years ago

In about an hour.. fixing some bugs.

rahlk commented 8 years ago

Results (Updating...) :

ant

+--------+-----------+---------+
| Metric | Threshold | P-Value |
+========+===========+=========+
| wmc    | 14.67     | 0.000   |
+--------+-----------+---------+
| cbo    | 30.13     | 0.000   |
+--------+-----------+---------+
| lcom   | 849.16    | 0.000   |
+--------+-----------+---------+
| loc    | 2951.64   | 0.000   |
+--------+-----------+---------+
| cam    | 0.84      | 0.000   |
+--------+-----------+---------+
| ic     | 5.29      | 0.000   |
+--------+-----------+---------+
| max_cc | 34.47     | 0.000   |
+--------+-----------+---------+
| avg_cc | 14.63     | 0.003   |
+--------+-----------+---------+ 

rank ,         name ,    med   ,  iqr 
----------------------------------------------------
   1 ,   Reduce cam ,    12.35  ,  13.25 (  --*          |              ), 7.83,  13.86,  21.08
   1 ,   Reduce wmc ,    12.65  ,  11.45 (  -*           |              ), 9.64,  12.65,  21.08
   1 , Reduce avg_cc ,    14.46  ,  5.42 (   -*          |              ), 12.05,  15.06,  17.47
   1 ,   Reduce loc ,    15.06  ,  13.25 (   -*          |              ), 10.24,  15.66,  23.49
   1 ,    Reduce ic ,    15.36  ,  7.23  (   --*         |              ), 11.45,  16.87,  18.67
   1 ,   Reduce cbo ,    16.57  ,  9.04  (   --*         |              ), 12.05,  18.07,  21.08
   1 ,  Reduce lcom ,    17.77  ,  12.05 (  ---*         |              ), 9.64,  18.07,  21.69
   1 , Reduce max_cc ,    19.58  ,  7.23 (     *         |              ), 16.87,  19.88,  24.10
   2 ,         RANK ,    47.89  ,  30.72 (           ---*|              ), 37.95,  48.19,  68.67

ivy

+--------+-----------+---------+
| Metric | Threshold | P-Value |
+========+===========+=========+
| wmc    | 84.99     | 0.000   |
+--------+-----------+---------+
| cbo    | 22.17     | 0.002   |
+--------+-----------+---------+
| lcom   | 16048.61  | 0.027   |
+--------+-----------+---------+
| loc    | 1668.51   | 0.000   |
+--------+-----------+---------+
| cam    | 2.29      | 0.000   |
+--------+-----------+---------+
| max_cc | 31.06     | 0.034   |
+--------+-----------+---------+
| avg_cc | 30.91     | 0.026   |
+--------+-----------+---------+ 

rank ,         name ,    med   ,  iqr 
----------------------------------------------------
   1 ,   Reduce cam ,    20.00  ,  15.00 (    -*         |              ), 15.00,  20.00,  30.00
   1 ,   Reduce loc ,    20.00  ,  10.00 (    --*        |              ), 15.00,  22.50,  25.00
   1 ,   Reduce cbo ,    21.25  ,  10.00 (     -*        |              ), 17.50,  22.50,  27.50
   1 , Reduce max_cc ,    21.25  ,  7.50 (     -*        |              ), 17.50,  22.50,  25.00
   1 ,  Reduce lcom ,     22.50  ,  2.50 (      *        |              ), 22.50,  22.50,  25.00
   1 ,   Reduce wmc ,    23.75  ,  10.00 (     --*       |              ), 17.50,  25.00,  27.50
   1 , Reduce avg_cc ,   23.75  ,  15.00 (     ---*      |              ), 17.50,  30.00,  32.50
   2 ,         RANK ,    57.50  ,  12.50 (              -|-*            ), 47.50,  57.50,  60.00

poi

+--------+-----------+---------+
| Metric | Threshold | P-Value |
+========+===========+=========+
| lcom   | 4092.69   | 0.000   |
+--------+-----------+---------+
| lcom3  | 4.78      | 0.000   |
+--------+-----------+---------+
| loc    | 71055.23  | 0.000   |
+--------+-----------+---------+
| cam    | 3.34      | 0.000   |
+--------+-----------+---------+
| ic     | 26.97     | 0.000   |
+--------+-----------+---------+ 

rank ,         name ,    med   ,  iqr 
----------------------------------------------------
   1 ,   Reduce cam ,    8.54  ,  1.07 (  *            |              ), 8.19,  8.90,  9.25
   1 , Reduce lcom3 ,    8.72  ,  3.56 (  *            |              ), 7.12,  8.90,  10.68
   1 ,  Reduce lcom ,    8.90  ,  2.49 (  *            |              ), 7.47,  8.90,  9.96
   1 ,   Reduce loc ,    9.07  ,  2.85 (  *            |              ), 7.47,  9.25,  10.32
   1 ,    Reduce ic ,    9.96  ,  2.14 (  *            |              ), 8.90,  9.96,  11.03
   2 ,        RANK ,    23.13  ,  6.41 (     --*       |              ), 19.22,  23.84,  25.62

jedit

+--------+-----------+---------+
| Metric | Threshold | P-Value |
+========+===========+=========+
| dit    | 14.47     | 0.000   |
+--------+-----------+---------+
| rfc    | 20.73     | 0.000   |
+--------+-----------+---------+
| ca     | 2.37      | 0.000   |
+--------+-----------+---------+
| ce     | 2.69      | 0.000   |
+--------+-----------+---------+
| npm    | 11.55     | 0.000   |
+--------+-----------+---------+
| lcom3  | 4.16      | 0.000   |
+--------+-----------+---------+
| loc    | 61269.41  | 0.000   |
+--------+-----------+---------+
| dam    | 0.53      | 0.000   |
+--------+-----------+---------+
| moa    | 8.88      | 0.000   |
+--------+-----------+---------+
| cbm    | 6.76      | 0.000   |
+--------+-----------+---------+
| amc    | 510.48    | 0.001   |
+--------+-----------+---------+
| avg_cc | 2.02      | 0.000   |
+--------+-----------+---------+ 

rank ,         name ,    med   ,  iqr 
----------------------------------------------------
   1 ,   Reduce dit ,    36.36  ,  9.09 (          *    |              ), 36.36,  36.36,  45.45
   1 ,   Reduce rfc ,    36.36  ,  9.09 (          *    |              ), 36.36,  36.36,  45.45
   1 ,    Reduce ca ,    36.36  ,  18.18 (        --*    |              ), 27.27,  36.36,  45.45
   1 ,    Reduce ce ,    36.36  ,  18.18 (        --*    |              ), 27.27,  36.36,  45.45
   1 ,   Reduce npm ,    36.36  ,  18.18 (        --*    |              ), 27.27,  36.36,  45.45
   1 , Reduce lcom3 ,    36.36  ,  9.09 (        --*    |              ), 27.27,  36.36,  36.36
   1 ,   Reduce loc ,    36.36  ,  9.09 (          *    |              ), 36.36,  36.36,  45.45
   1 ,   Reduce dam ,    36.36  ,  27.27 (     -----*    |              ), 18.18,  36.36,  45.45
   1 ,   Reduce moa ,    36.36  ,  36.36 (  --------*    |              ), 9.09,  36.36,  45.45
   1 ,   Reduce cbm ,    36.36  ,  9.09 (          *    |              ), 36.36,  36.36,  45.45
   1 ,   Reduce amc ,    36.36  ,  9.09 (          *    |              ), 36.36,  36.36,  45.45
   1 ,         RANK ,    36.36  ,  0.00 (          *    |              ), 36.36,  36.36,  36.36
   1 , Reduce avg_cc ,    40.91  ,  9.09 (          ---* |              ), 36.36,  45.45,  45.45

timm commented 8 years ago

:smirk:

[ ] why are these thresholds different in different data sets?
[ ] what is the source reference for each threshold?
[ ] how many more should i expect?
can i get these in latex too? not today but early next week?

rahlk commented 8 years ago

[x] why are these thresholds different in different data sets?
Nature of the data set/projects. I only retain metrics with valid thresholds with P<0.05. Only LOC and LCOM are common to all data sets. This trend has also been noted in the 2010 TSE article.
[x] what is the source reference for each threshold? The method (VARL) proposed by 2010 TSE article (is this what you mean?)
[x] how many more should i expect? 1 More. I'm running these with 40 repeats (random no. > 30)
can i get these in latex too? not today but early next week? 10 - 4.

timm commented 8 years ago

I only retain metrics with valid thresholds with P<0.05.

so is the deal that the 2010 TSE paper defines a procedure for finding thresholds? and you applied that procedure and got the above? what is that procedure? please answer in enough detail so i can succinctly but authoritatively write this down n the paper.

rahlk commented 8 years ago

so is the deal that the 2010 TSE paper defines a procedure for finding thresholds? and you applied that procedure and got the above?

Yup, that's right.

what is that procedure? please answer in enough detail so I can succinctly but authoritatively write this down n the paper.

In our work, we have coded fault-free classes as zero, and faulty classes as one. We could leverage this binary nature to apply a Univariate Binary Logistic Regression (UBR) to identify metrics that have a significant association with the occurrence of defects. To set a cut-off for this association, we use a confidence interval of 95\%.

To identify thresholds for the metrics that we significant, we use a method called Value of Acceptable Risk Level (VARL) first proposed by Bender~\cite{bender99} in identifying thresholds in epidemiology studies. In his TSE 2010 article, Shatnawi~\cite{shatnawi10} endorsed the use of this method in identifying thresholds in object-oriented metrics for open source software systems.

The VARL method measures cut-off values in metrics such that, below that threshold, the probability of occurrence of defect is less than a probability $p_0$. To do this, we fit a Univariate Binary Logistic Regression (UBR) to the metrics. For every significant metric, this generates a general logistic regression model with a constant intercept ($\alpha$) and a coefficient for maximizing log-likelihood function ($\beta$). With these, the VARL is measure as follows:

\begin{equation} VARL = \frac{1}{\beta }\left( {\log \left( {\frac{{{p_0}}}{{1 - {p_0}}}} \right) - \alpha } \right) \end{equation}

why are these thresholds different in different data sets?

It is highly unlikely that the metrics have a similar impact on all data sets. Therefore, we must run the model on a data set to identify metrics and corresponding thresholds that matter.

timm commented 8 years ago

v.good

rahlk commented 8 years ago

Ant

{\scriptsize \begin{tabular}{l@{~~~}l@{~~~}r@{~~~}r@{~~~}c}
\arrayrulecolor{lightgray}
\textbf{Rank} & \textbf{Treatment} & \textbf{Median} & \textbf{IQR} & \\\hline
  1 &         RANK &    57.83  &  29.52 & \quart{46}{33}{66}{1} \\
\hline  2 &   Reduce cbo &    16.27  &  4.21 & \quart{15}{5}{18}{1} \\
  2 &   Reduce loc &    15.66  &  2.41 & \quart{16}{3}{17}{1} \\
  2 &   Reduce cam &    15.06  &  3.01 & \quart{16}{3}{17}{1} \\
  2 & Reduce avg_cc &    15.66  &  3.01 & \quart{16}{3}{17}{1} \\
  2 &    Reduce ic &    15.66  &  3.61 & \quart{15}{4}{17}{1} \\
  2 &  Reduce lcom &    15.66  &  4.82 & \quart{14}{5}{17}{1} \\
  2 &   Reduce wmc &    15.66  &  3.01 & \quart{15}{4}{17}{1} \\
  2 & Reduce max_cc &    15.06  &  2.41 & \quart{15}{3}{17}{1} \\
\hline \end{tabular}}

Ivy

{\scriptsize \begin{tabular}{l@{~~~}l@{~~~}r@{~~~}r@{~~~}c}
\arrayrulecolor{lightgray}
\textbf{Rank} & \textbf{Treatment} & \textbf{Median} & \textbf{IQR} & \\\hline
  1 &         RANK &    52.5  &  17.5 & \quart{57}{22}{67}{1} \\
\hline  2 & Reduce avg_cc &    22.5  &  7.5 & \quart{25}{10}{28}{1} \\
  2 &   Reduce loc &    22.5  &  10.0 & \quart{22}{13}{28}{1} \\
  2 &   Reduce cbo &    22.5  &  10.0 & \quart{22}{13}{28}{1} \\
  2 &   Reduce wmc &    22.5  &  7.5 & \quart{22}{9}{28}{1} \\
  2 & Reduce max_cc &    20.0  &  7.5 & \quart{22}{9}{25}{1} \\
  2 &   Reduce cam &    20.0  &  10.0 & \quart{22}{13}{25}{1} \\
\hline \end{tabular}}

Poi

{\scriptsize \begin{tabular}{l@{~~~}l@{~~~}r@{~~~}r@{~~~}c}
\arrayrulecolor{lightgray}
\textbf{Rank} & \textbf{Treatment} & \textbf{Median} & \textbf{IQR} & \\\hline
  1 &         RANK &    19.93  &  12.11 & \quart{46}{33}{54}{2} \\
\hline  2 &  Reduce lcom &    9.25  &  1.43 & \quart{23}{4}{25}{2} \\
  2 &    Reduce ic &    9.25  &  1.43 & \quart{23}{4}{25}{2} \\
  2 & Reduce lcom3 &    8.9  &  1.77 & \quart{22}{5}{24}{2} \\
\hline  3 &   Reduce loc &    8.9  &  2.14 & \quart{20}{6}{24}{2} \\
  3 &   Reduce cam &    8.53  &  1.78 & \quart{21}{5}{23}{2} \\
\hline \end{tabular}}

Jedit

{\scriptsize \begin{tabular}{l@{~~~}l@{~~~}r@{~~~}r@{~~~}c}
\arrayrulecolor{lightgray}
\textbf{Rank} & \textbf{Treatment} & \textbf{Median} & \textbf{IQR} & \\\
  1 &   Reduce dam &    36.36  &  27.28 & \quart{34}{34}{45}{1} \\
  1 &   Reduce moa &    36.36  &  18.19 & \quart{45}{23}{45}{1} \\
  1 &   Reduce rfc &    45.45  &  18.19 & \quart{45}{23}{57}{1} \\
  1 &    Reduce ca &    45.45  &  18.19 & \quart{45}{23}{57}{1} \\
  1 &    Reduce ce &    45.45  &  18.19 & \quart{45}{23}{57}{1} \\
  1 &   Reduce npm &    45.45  &  18.19 & \quart{45}{23}{57}{1} \\
  1 &   Reduce loc &    45.45  &  9.09 & \quart{45}{12}{57}{1} \\
  1 &   Reduce amc &    45.45  &  27.28 & \quart{45}{34}{57}{1} \\
  1 & Reduce avg_cc &    45.45  &  18.19 & \quart{45}{23}{57}{1} \\
\hline  2 &   Reduce dit &    36.36  &  36.37 & \quart{22}{46}{45}{1} \\
  2 & Reduce lcom3 &    36.36  &  18.19 & \quart{45}{23}{45}{1} \\
  2 &   Reduce cbm &    36.36  &  18.19 & \quart{45}{23}{45}{1} \\
  2 &         RANK &    36.36  &  0.0 & \quart{45}{0}{45}{1} \\
\hline \end{tabular}}

Lucene

{\scriptsize \begin{tabular}{l@{~~~}l@{~~~}r@{~~~}r@{~~~}c}
\arrayrulecolor{lightgray}
\textbf{Rank} & \textbf{Treatment} & \textbf{Median} & \textbf{IQR} & \\\hline
  1 &         RANK &    14.78  &  4.92 & \quart{57}{22}{66}{4} \\
  1 & Reduce lcom3 &    15.76  &  1.96 & \quart{68}{9}{71}{4} \\
  1 &   Reduce moa &    15.76  &  2.45 & \quart{66}{11}{71}{4} \\
  1 &   Reduce cbo &    16.26  &  1.97 & \quart{71}{8}{73}{4} \\
  1 &   Reduce npm &    16.26  &  2.46 & \quart{68}{11}{73}{4} \\
  1 &   Reduce loc &    16.75  &  2.46 & \quart{68}{11}{75}{4} \\
\hline \end{tabular}}

timm commented 8 years ago

re harman's threshold technique

[ ] will need a bibtex reference
[ ] this is just using some study from 2002, right? so no tuning to local conditions?

rahlk commented 8 years ago

re harman's threshold technique

[x] will need a bibtex reference

There are 2 references.

@article{hermans15,
  title={Detecting and refactoring code smells in spreadsheet formulas},
  author={Hermans, Felienne and Pinzger, Martin and van Deursen, Arie},
  journal={Empirical Software Engineering},
  volume={20},
  number={2},
  pages={549--575},
  year={2015},
  publisher={Springer}
}

@inproceedings{Alves2010,
author = {Alves, Tiago L. and Ypma, Christiaan and Visser, Joost},
booktitle = {2010 IEEE Int. Conf. Softw. Maint.},
doi = {10.1109/ICSM.2010.5609747},
benchmark data - 2010.pdf:pdf},
isbn = {978-1-4244-8630-4},
issn = {10636773},
mendeley-groups = {OO Metric Thresholds},
month = {sep},
pages = {1--10},
publisher = {IEEE},
title = {{Deriving metric thresholds from benchmark data}},
url = {http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?arnumber=5609747},
year = {2010}
}

[X] this is just using some study from 2002, right? so no tuning to local conditions?

They seem to use a benchmark data set to derive a set of common thresholds. Since, we don't have that, we can derive thresholds separately for every data set. The technique is straightforward.

rahlk commented 8 years ago

Hermans thresholds

Summary

In addition to using VARL to identify thresholds as proposed by Shatnawi. We another alternative method proposed by Alves et al~\cite{alves10}. This method is unique in that respects the underlying statistical distribution and scale of the metrics. It works as follows.

Evey metric value is weighted according to the source lines of code (LOC) of the class. All the weighted metrics are then normalized i.e., they are divided by the sum of all weights of the same system. Following this, the normalized metric values are ordered in an ascending fashion. This is equivalent to computing a density function, in which the x-axis represents the weight ratio (0-100%), and the y-axis the metric scale.

Thresholds are then derived by choosing the percentage of the overall code that needs to be represented. For instance, Alves et al suggest the use 90% quantile of the overall code to derive the threshold for a specific metric. This threshold is meaningful since it can be used to identify 10% of the worst code with respect to a specific metric. And thresholds greater than 90\% represent very-high risk.

Ant

{\scriptsize \begin{tabular}{l@{~~~}l@{~~~}r@{~~~}r@{~~~}c}
\arrayrulecolor{lightgray}
\textbf{Rank} & \textbf{Treatment} & \textbf{Median} & \textbf{IQR} & \\\hline
  1 &         RANK &    63.25  &  24.1 & \quart{53}{26}{70}{1} \\
\hline  2 &   Reduce wmc &    22.29  &  6.63 & \quart{19}{7}{24}{1} \\
  2 & Reduce max_cc &    21.69  &  7.23 & \quart{18}{8}{24}{1} \\
  2 &   Reduce loc &    21.69  &  4.82 & \quart{20}{6}{24}{1} \\
  2 &  Reduce lcom &    21.69  &  4.82 & \quart{22}{6}{24}{1} \\
  2 &   Reduce cbo &    21.69  &  4.82 & \quart{21}{5}{24}{1} \\
  2 &    Reduce ic &    21.69  &  5.43 & \quart{20}{6}{24}{1} \\
  2 &   Reduce cbm &    21.08  &  5.43 & \quart{20}{6}{23}{1} \\
  2 &   Reduce dam &    21.08  &  6.02 & \quart{21}{7}{23}{1} \\
  2 &   Reduce npm &    21.08  &  5.43 & \quart{20}{6}{23}{1} \\
  2 &   Reduce rfc &    21.08  &  3.61 & \quart{21}{4}{23}{1} \\
  2 &   Reduce cam &    21.08  &  4.22 & \quart{20}{5}{23}{1} \\
  2 &   Reduce moa &    19.88  &  5.42 & \quart{20}{6}{22}{1} \\
  2 &    Reduce ce &    20.48  &  4.21 & \quart{21}{5}{22}{1} \\
  2 & Reduce avg_cc &    19.88  &  7.23 & \quart{19}{8}{22}{1} \\
\hline \end{tabular}}

Ivy

{\scriptsize \begin{tabular}{l@{~~~}l@{~~~}r@{~~~}r@{~~~}c}
\arrayrulecolor{lightgray}
\textbf{Rank} & \textbf{Treatment} & \textbf{Median} & \textbf{IQR} & \\\hline
  1 &   Reduce noc &    30.0  &  15.0 & \quart{31}{20}{38}{1} \\
  1 &   Reduce amc &    30.0  &  12.5 & \quart{31}{16}{38}{1} \\
  1 &    Reduce ce &    30.0  &  12.5 & \quart{35}{16}{38}{1} \\
  1 &  Reduce lcom &    32.5  &  10.0 & \quart{35}{12}{41}{1} \\
  1 &   Reduce loc &    32.5  &  12.5 & \quart{35}{16}{41}{1} \\
  1 &   Reduce wmc &    32.5  &  17.5 & \quart{31}{23}{41}{1} \\
  1 &   Reduce cbo &    35.0  &  12.5 & \quart{35}{16}{44}{1} \\
  1 &   Reduce rfc &    35.0  &  12.5 & \quart{35}{16}{44}{1} \\
  1 &   Reduce npm &    35.0  &  7.5 & \quart{38}{9}{44}{1} \\
  1 &   Reduce cam &    35.0  &  15.0 & \quart{38}{19}{44}{1} \\
  1 & Reduce max_cc &    35.0  &  12.5 & \quart{35}{16}{44}{1} \\
  1 & Reduce avg_cc &    35.0  &  15.0 & \quart{35}{19}{44}{1} \\
  1 &   Reduce cbm &    40.0  &  17.5 & \quart{38}{22}{51}{1} \\
\hline  2 &         RANK &    52.5  &  20.0 & \quart{54}{25}{67}{1} \\
\hline \end{tabular}}

Jedit

{\scriptsize \begin{tabular}{l@{~~~}l@{~~~}r@{~~~}r@{~~~}c}
\arrayrulecolor{lightgray}
\textbf{Rank} & \textbf{Treatment} & \textbf{Median} & \textbf{IQR} & \\\hline
  1 &   Reduce wmc &    36.36  &  0.0 & \quart{79}{0}{79}{2} \\
  1 &   Reduce dit &    36.36  &  0.0 & \quart{79}{0}{79}{2} \\
  1 &   Reduce cbo &    36.36  &  0.0 & \quart{79}{0}{79}{2} \\
  1 &   Reduce rfc &    36.36  &  0.0 & \quart{79}{0}{79}{2} \\
  1 &  Reduce lcom &    36.36  &  36.36 & \quart{0}{79}{79}{2} \\
  1 &    Reduce ca &    36.36  &  0.0 & \quart{79}{0}{79}{2} \\
  1 &    Reduce ce &    36.36  &  0.0 & \quart{79}{0}{79}{2} \\
  1 &   Reduce npm &    36.36  &  0.0 & \quart{79}{0}{79}{2} \\
  1 & Reduce lcom3 &    36.36  &  0.0 & \quart{79}{0}{79}{2} \\
  1 &   Reduce loc &    36.36  &  0.0 & \quart{79}{0}{79}{2} \\
  1 &   Reduce dam &    36.36  &  36.36 & \quart{0}{79}{79}{2} \\
  1 &   Reduce moa &    36.36  &  0.0 & \quart{79}{0}{79}{2} \\
  1 &   Reduce cam &    36.36  &  0.0 & \quart{79}{0}{79}{2} \\
  1 &    Reduce ic &    36.36  &  0.0 & \quart{79}{0}{79}{2} \\
  1 &   Reduce cbm &    36.36  &  0.0 & \quart{79}{0}{79}{2} \\
  1 &   Reduce amc &    36.36  &  0.0 & \quart{79}{0}{79}{2} \\
  1 & Reduce max_cc &    36.36  &  0.0 & \quart{79}{0}{79}{2} \\
  1 & Reduce avg_cc &    36.36  &  0.0 & \quart{79}{0}{79}{2} \\
  1 &         RANK &    36.36  &  0.0 & \quart{79}{0}{79}{2} \\
\hline \end{tabular}}

Lucene

{\scriptsize \begin{tabular}{l@{~~~}l@{~~~}r@{~~~}r@{~~~}c}
\arrayrulecolor{lightgray}
\textbf{Rank} & \textbf{Treatment} & \textbf{Median} & \textbf{IQR} & \\\hline
  1 &  Reduce lcom &    14.78  &  2.46 & \quart{51}{9}{57}{3} \\
  1 &   Reduce dam &    14.78  &  1.97 & \quart{55}{7}{57}{3} \\
  1 &   Reduce npm &    15.27  &  2.96 & \quart{53}{11}{59}{3} \\
  1 &   Reduce cam &    15.27  &  2.46 & \quart{55}{9}{59}{3} \\
  1 &   Reduce rfc &    15.76  &  1.48 & \quart{57}{5}{60}{3} \\
  1 & Reduce lcom3 &    15.76  &  1.97 & \quart{57}{7}{60}{3} \\
  1 &   Reduce loc &    15.76  &  2.96 & \quart{53}{11}{60}{3} \\
\hline  2 &   Reduce cbo &    15.76  &  2.94 & \quart{55}{11}{60}{3} \\
  2 &   Reduce cbm &    15.76  &  2.45 & \quart{57}{9}{60}{3} \\
  2 &   Reduce wmc &    16.26  &  2.94 & \quart{55}{11}{62}{3} \\
  2 &    Reduce ce &    16.26  &  2.45 & \quart{57}{9}{62}{3} \\
  2 &   Reduce amc &    16.26  &  2.46 & \quart{55}{9}{62}{3} \\
  2 &   Reduce moa &    16.26  &  1.96 & \quart{59}{7}{62}{3} \\
  2 &         RANK &    16.75  &  7.88 & \quart{49}{30}{64}{3} \\
\hline \end{tabular}}

Poi

{\scriptsize \begin{tabular}{l@{~~~}l@{~~~}r@{~~~}r@{~~~}c}
\arrayrulecolor{lightgray}
\textbf{Rank} & \textbf{Treatment} & \textbf{Median} & \textbf{IQR} & \\\hline
  1 &  Reduce lcom &    9.61  &  2.86 & \quart{25}{9}{28}{2} \\
  1 &   Reduce npm &    9.61  &  4.27 & \quart{23}{13}{28}{2} \\
  1 & Reduce lcom3 &    9.61  &  2.15 & \quart{25}{7}{28}{2} \\
  1 &    Reduce ic &    9.96  &  3.2 & \quart{26}{10}{29}{2} \\
  1 &   Reduce amc &    9.96  &  1.78 & \quart{26}{6}{29}{2} \\
  1 &    Reduce ce &    9.96  &  2.86 & \quart{25}{9}{29}{2} \\
  1 &   Reduce rfc &    10.32  &  2.84 & \quart{26}{9}{30}{2} \\
  1 &   Reduce moa &    10.32  &  2.13 & \quart{26}{7}{30}{2} \\
  1 &   Reduce mfa &    10.32  &  3.21 & \quart{25}{10}{30}{2} \\
  1 &   Reduce wmc &    10.32  &  2.13 & \quart{28}{7}{30}{2} \\
\hline  2 &   Reduce dit &    10.68  &  2.13 & \quart{28}{7}{32}{2} \\
  2 &   Reduce cam &    10.68  &  3.21 & \quart{25}{10}{32}{2} \\
  2 & Reduce max_cc &    10.32  &  3.2 & \quart{26}{10}{30}{2} \\
  2 &   Reduce loc &    11.03  &  3.2 & \quart{28}{10}{33}{2} \\
  2 &   Reduce cbm &    11.39  &  2.14 & \quart{29}{7}{34}{2} \\
\hline  3 &         RANK &    20.64  &  8.9 & \quart{53}{26}{61}{2} \\
\hline \end{tabular}}

rahlk commented 8 years ago

Summary Shatnawi10

In our work, we have coded fault-free classes as zero, and faulty classes as one. We could leverage this binary nature to apply a Univariate Binary Logistic Regression (UBR) to identify metrics that have a significant association with the occurrence of defects. To set a cut-off for this association, we use a confidence interval of 95\%.

To identify thresholds for the metrics that we significant, we use a method called Value of Acceptable Risk Level (VARL) first proposed by Bender~\cite{bender99} in identifying thresholds in epidemiology studies. In his TSE 2010 article, Shatnawi~\cite{shatnawi10} endorsed the use of this method in identifying thresholds in object-oriented metrics for open source software systems.

The VARL method measures cut-off values in metrics such that, below that threshold, the probability of occurrence of defect is less than a probability $p_0$. To do this, we fit a Univariate Binary Logistic Regression (UBR) to the metrics. For every significant metric, this generates a general logistic regression model with a constant intercept ($\alpha$) and a coefficient for maximizing log-likelihood function ($\beta$). With these, the VARL is measure as follows:

\begin{equation} VARL = \frac{1}{\beta }\left( {\log \left( {\frac{{{p_0}}}{{1 - {p_0}}}} \right) - \alpha } \right) \end{equation}

Summary Alves10

In addition to using VARL to identify thresholds as proposed by Shatnawi. We another alternative method proposed by Alves et al~\cite{alves10}. This method is unique in that respects the underlying statistical distribution and scale of the metrics. It works as follows.

Evey metric value is weighted according to the source lines of code (LOC) of the class. All the weighted metrics are then normalized i.e., they are divided by the sum of all weights of the same system. Following this, the normalized metric values are ordered in an ascending fashion. This is equivalent to computing a density function, in which the x-axis represents the weight ratio (0-100%), and the y-axis the metric scale.

Thresholds are then derived by choosing the percentage of the overall code that needs to be represented. For instance, Alves et al suggest the use 90% quantile of the overall code to derive the threshold for a specific metric. This threshold is meaningful since it can be used to identify 10% of the worst code with respect to a specific metric. And thresholds greater than 90\% represent a very high risk.

Deprecated Method

One of the first methods of finding thresholds was proposed Erni and Lewerentz~\cite{erni96}. Their technique to identify thresholds was based on the data distribution, specifically the mean and the standard deviation of the metric values. They propose the use of values that lie beyond one standard deviation from the mean as a threshold. The minimum value $T{min}$ is given by $T{min}=\mu-\sigma$, and this is used when metric definition considers very small values as an indicator of problems. Otherwise, $T_{max}=\mu+\sigma$ is used, when large metric values are considered problematic.

Several researchers~\cite{shatnawi10}~cite{alves10} have pointed out that this method is subject to a few problems. Firstly, it doesn't consider the fault-proneness of classes when the thresholds are computed. Secondly, there is a lack of empirical validation of this methodology, which impedes reasonable comparisons.

ai-se / XTREE-FSE

need papers with thresholds #7

Results (Updating...) :

ant

ivy

poi

jedit

Ant

Ivy

Poi

Jedit

Lucene

Hermans thresholds

Summary

Ant

Ivy

Jedit

Lucene

Poi

Summary Shatnawi10

Summary Alves10

Deprecated Method