ai-se / XTREE-FSE

0 stars 0 forks source link

need papers with thresholds #7

Open timm opened 8 years ago

timm commented 8 years ago

Need to know what happens when thresholds from N sources are applied to our data sets. ANd i need that written up and into the paper.

BTW, Here'a a paper that does what we hate. menthions metrcs but not thresholds

rahlk commented 8 years ago

Hahaha, the images in issue #5 were from those papers! They have some awesome references which I'll use for our paper.

timm commented 8 years ago

what we need is you to run some quick select queries over our test data. do you see this chart and "SEESAW"? that was an old tool of mine. but see how it does better than N other standard things?

what we need is this chart for the defect data sets with "SEESAW" replaced with "RANK" (the name of our method, currently, in this paper)


rahlk commented 8 years ago
|         | VARL (Shatnawi '10) | Filó et al. |
+ Metrics +---------------------+-------------+
|         | Threshold | P-Value | Threshold   |
| CBO     | 1.78      | 0.000   | -           |
| MAX_CC  | 2.07      | 0.000   | -           |
| AVG_CC  | 0.86      | 0.003   | -           |
| LCOM    | 51        | 0.000   | 725         |
| LOC     | 171.59    | 0.000   | 30          |
| NOC     | -         | -       | 28          |
| CA      | -         | -       | 39          |
| CE      | -         | -       | 16          |
| DIT     | -         | -       | 4           |
| WMC     | -         | -       | 34          |
timm commented 8 years ago

Goof. Will need bibtex entries for all papers you usr

timm commented 8 years ago

Also... when these ranges are applied to the data, what effect do they have to the defect distribution?

rahlk commented 8 years ago

Working on that, I'll have the results this evening. On Mar 2, 2016 8:44 AM, "Tim Menzies" wrote:

Also... when these ranges are applied to the data, what effect do they have to the defect distribution?

— Reply to this email directly or view it on GitHub

timm commented 8 years ago

Nor that the home run would be that if Data set d, that RANK found good tteatments for, after division into good,bad (where bad = rows selected by threshold and good = all - bad) then the defect density is,about the same in good and bad (as witnessed by, say, box plots)

timm commented 8 years ago

what are the thresholds in the tool that harman used to assess his refactorings?

rahlk commented 8 years ago

Harman's refactoring tool thresholds. I'm looking in to this, comment as soon as I find it.

timm commented 8 years ago

when can i get results from applying those thresholds?

rahlk commented 8 years ago

In about an hour.. fixing some bugs.

rahlk commented 8 years ago

Results (Updating...) :


| Metric | Threshold | P-Value |
| wmc    | 14.67     | 0.000   |
| cbo    | 30.13     | 0.000   |
| lcom   | 849.16    | 0.000   |
| loc    | 2951.64   | 0.000   |
| cam    | 0.84      | 0.000   |
| ic     | 5.29      | 0.000   |
| max_cc | 34.47     | 0.000   |
| avg_cc | 14.63     | 0.003   |

rank ,         name ,    med   ,  iqr 
   1 ,   Reduce cam ,    12.35  ,  13.25 (  --*          |              ), 7.83,  13.86,  21.08
   1 ,   Reduce wmc ,    12.65  ,  11.45 (  -*           |              ), 9.64,  12.65,  21.08
   1 , Reduce avg_cc ,    14.46  ,  5.42 (   -*          |              ), 12.05,  15.06,  17.47
   1 ,   Reduce loc ,    15.06  ,  13.25 (   -*          |              ), 10.24,  15.66,  23.49
   1 ,    Reduce ic ,    15.36  ,  7.23  (   --*         |              ), 11.45,  16.87,  18.67
   1 ,   Reduce cbo ,    16.57  ,  9.04  (   --*         |              ), 12.05,  18.07,  21.08
   1 ,  Reduce lcom ,    17.77  ,  12.05 (  ---*         |              ), 9.64,  18.07,  21.69
   1 , Reduce max_cc ,    19.58  ,  7.23 (     *         |              ), 16.87,  19.88,  24.10
   2 ,         RANK ,    47.89  ,  30.72 (           ---*|              ), 37.95,  48.19,  68.67


| Metric | Threshold | P-Value |
| wmc    | 84.99     | 0.000   |
| cbo    | 22.17     | 0.002   |
| lcom   | 16048.61  | 0.027   |
| loc    | 1668.51   | 0.000   |
| cam    | 2.29      | 0.000   |
| max_cc | 31.06     | 0.034   |
| avg_cc | 30.91     | 0.026   |

rank ,         name ,    med   ,  iqr 
   1 ,   Reduce cam ,    20.00  ,  15.00 (    -*         |              ), 15.00,  20.00,  30.00
   1 ,   Reduce loc ,    20.00  ,  10.00 (    --*        |              ), 15.00,  22.50,  25.00
   1 ,   Reduce cbo ,    21.25  ,  10.00 (     -*        |              ), 17.50,  22.50,  27.50
   1 , Reduce max_cc ,    21.25  ,  7.50 (     -*        |              ), 17.50,  22.50,  25.00
   1 ,  Reduce lcom ,     22.50  ,  2.50 (      *        |              ), 22.50,  22.50,  25.00
   1 ,   Reduce wmc ,    23.75  ,  10.00 (     --*       |              ), 17.50,  25.00,  27.50
   1 , Reduce avg_cc ,   23.75  ,  15.00 (     ---*      |              ), 17.50,  30.00,  32.50
   2 ,         RANK ,    57.50  ,  12.50 (              -|-*            ), 47.50,  57.50,  60.00


| Metric | Threshold | P-Value |
| lcom   | 4092.69   | 0.000   |
| lcom3  | 4.78      | 0.000   |
| loc    | 71055.23  | 0.000   |
| cam    | 3.34      | 0.000   |
| ic     | 26.97     | 0.000   |

rank ,         name ,    med   ,  iqr 
   1 ,   Reduce cam ,    8.54  ,  1.07 (  *            |              ), 8.19,  8.90,  9.25
   1 , Reduce lcom3 ,    8.72  ,  3.56 (  *            |              ), 7.12,  8.90,  10.68
   1 ,  Reduce lcom ,    8.90  ,  2.49 (  *            |              ), 7.47,  8.90,  9.96
   1 ,   Reduce loc ,    9.07  ,  2.85 (  *            |              ), 7.47,  9.25,  10.32
   1 ,    Reduce ic ,    9.96  ,  2.14 (  *            |              ), 8.90,  9.96,  11.03
   2 ,        RANK ,    23.13  ,  6.41 (     --*       |              ), 19.22,  23.84,  25.62


| Metric | Threshold | P-Value |
| dit    | 14.47     | 0.000   |
| rfc    | 20.73     | 0.000   |
| ca     | 2.37      | 0.000   |
| ce     | 2.69      | 0.000   |
| npm    | 11.55     | 0.000   |
| lcom3  | 4.16      | 0.000   |
| loc    | 61269.41  | 0.000   |
| dam    | 0.53      | 0.000   |
| moa    | 8.88      | 0.000   |
| cbm    | 6.76      | 0.000   |
| amc    | 510.48    | 0.001   |
| avg_cc | 2.02      | 0.000   |

rank ,         name ,    med   ,  iqr 
   1 ,   Reduce dit ,    36.36  ,  9.09 (          *    |              ), 36.36,  36.36,  45.45
   1 ,   Reduce rfc ,    36.36  ,  9.09 (          *    |              ), 36.36,  36.36,  45.45
   1 ,    Reduce ca ,    36.36  ,  18.18 (        --*    |              ), 27.27,  36.36,  45.45
   1 ,    Reduce ce ,    36.36  ,  18.18 (        --*    |              ), 27.27,  36.36,  45.45
   1 ,   Reduce npm ,    36.36  ,  18.18 (        --*    |              ), 27.27,  36.36,  45.45
   1 , Reduce lcom3 ,    36.36  ,  9.09 (        --*    |              ), 27.27,  36.36,  36.36
   1 ,   Reduce loc ,    36.36  ,  9.09 (          *    |              ), 36.36,  36.36,  45.45
   1 ,   Reduce dam ,    36.36  ,  27.27 (     -----*    |              ), 18.18,  36.36,  45.45
   1 ,   Reduce moa ,    36.36  ,  36.36 (  --------*    |              ), 9.09,  36.36,  45.45
   1 ,   Reduce cbm ,    36.36  ,  9.09 (          *    |              ), 36.36,  36.36,  45.45
   1 ,   Reduce amc ,    36.36  ,  9.09 (          *    |              ), 36.36,  36.36,  45.45
   1 ,         RANK ,    36.36  ,  0.00 (          *    |              ), 36.36,  36.36,  36.36
   1 , Reduce avg_cc ,    40.91  ,  9.09 (          ---* |              ), 36.36,  45.45,  45.45
timm commented 8 years ago


rahlk commented 8 years ago
timm commented 8 years ago

I only retain metrics with valid thresholds with P<0.05.

so is the deal that the 2010 TSE paper defines a procedure for finding thresholds? and you applied that procedure and got the above? what is that procedure? please answer in enough detail so i can succinctly but authoritatively write this down n the paper.

rahlk commented 8 years ago

so is the deal that the 2010 TSE paper defines a procedure for finding thresholds? and you applied that procedure and got the above?

Yup, that's right.

what is that procedure? please answer in enough detail so I can succinctly but authoritatively write this down n the paper.

In our work, we have coded fault-free classes as zero, and faulty classes as one. We could leverage this binary nature to apply a Univariate Binary Logistic Regression (UBR) to identify metrics that have a significant association with the occurrence of defects. To set a cut-off for this association, we use a confidence interval of 95\%.

To identify thresholds for the metrics that we significant, we use a method called Value of Acceptable Risk Level (VARL) first proposed by Bender~\cite{bender99} in identifying thresholds in epidemiology studies. In his TSE 2010 article, Shatnawi~\cite{shatnawi10} endorsed the use of this method in identifying thresholds in object-oriented metrics for open source software systems.

The VARL method measures cut-off values in metrics such that, below that threshold, the probability of occurrence of defect is less than a probability $p_0$. To do this, we fit a Univariate Binary Logistic Regression (UBR) to the metrics. For every significant metric, this generates a general logistic regression model with a constant intercept ($\alpha$) and a coefficient for maximizing log-likelihood function ($\beta$). With these, the VARL is measure as follows:

\begin{equation} VARL = \frac{1}{\beta }\left( {\log \left( {\frac{{{p_0}}}{{1 - {p_0}}}} \right) - \alpha } \right) \end{equation}

why are these thresholds different in different data sets?

It is highly unlikely that the metrics have a similar impact on all data sets. Therefore, we must run the model on a data set to identify metrics and corresponding thresholds that matter.

timm commented 8 years ago


rahlk commented 8 years ago


{\scriptsize \begin{tabular}{l@{~~~}l@{~~~}r@{~~~}r@{~~~}c}
\textbf{Rank} & \textbf{Treatment} & \textbf{Median} & \textbf{IQR} & \\\hline
  1 &         RANK &    57.83  &  29.52 & \quart{46}{33}{66}{1} \\
\hline  2 &   Reduce cbo &    16.27  &  4.21 & \quart{15}{5}{18}{1} \\
  2 &   Reduce loc &    15.66  &  2.41 & \quart{16}{3}{17}{1} \\
  2 &   Reduce cam &    15.06  &  3.01 & \quart{16}{3}{17}{1} \\
  2 & Reduce avg_cc &    15.66  &  3.01 & \quart{16}{3}{17}{1} \\
  2 &    Reduce ic &    15.66  &  3.61 & \quart{15}{4}{17}{1} \\
  2 &  Reduce lcom &    15.66  &  4.82 & \quart{14}{5}{17}{1} \\
  2 &   Reduce wmc &    15.66  &  3.01 & \quart{15}{4}{17}{1} \\
  2 & Reduce max_cc &    15.06  &  2.41 & \quart{15}{3}{17}{1} \\
\hline \end{tabular}}


{\scriptsize \begin{tabular}{l@{~~~}l@{~~~}r@{~~~}r@{~~~}c}
\textbf{Rank} & \textbf{Treatment} & \textbf{Median} & \textbf{IQR} & \\\hline
  1 &         RANK &    52.5  &  17.5 & \quart{57}{22}{67}{1} \\
\hline  2 & Reduce avg_cc &    22.5  &  7.5 & \quart{25}{10}{28}{1} \\
  2 &   Reduce loc &    22.5  &  10.0 & \quart{22}{13}{28}{1} \\
  2 &   Reduce cbo &    22.5  &  10.0 & \quart{22}{13}{28}{1} \\
  2 &   Reduce wmc &    22.5  &  7.5 & \quart{22}{9}{28}{1} \\
  2 & Reduce max_cc &    20.0  &  7.5 & \quart{22}{9}{25}{1} \\
  2 &   Reduce cam &    20.0  &  10.0 & \quart{22}{13}{25}{1} \\
\hline \end{tabular}}


{\scriptsize \begin{tabular}{l@{~~~}l@{~~~}r@{~~~}r@{~~~}c}
\textbf{Rank} & \textbf{Treatment} & \textbf{Median} & \textbf{IQR} & \\\hline
  1 &         RANK &    19.93  &  12.11 & \quart{46}{33}{54}{2} \\
\hline  2 &  Reduce lcom &    9.25  &  1.43 & \quart{23}{4}{25}{2} \\
  2 &    Reduce ic &    9.25  &  1.43 & \quart{23}{4}{25}{2} \\
  2 & Reduce lcom3 &    8.9  &  1.77 & \quart{22}{5}{24}{2} \\
\hline  3 &   Reduce loc &    8.9  &  2.14 & \quart{20}{6}{24}{2} \\
  3 &   Reduce cam &    8.53  &  1.78 & \quart{21}{5}{23}{2} \\
\hline \end{tabular}}


{\scriptsize \begin{tabular}{l@{~~~}l@{~~~}r@{~~~}r@{~~~}c}
\textbf{Rank} & \textbf{Treatment} & \textbf{Median} & \textbf{IQR} & \\\
  1 &   Reduce dam &    36.36  &  27.28 & \quart{34}{34}{45}{1} \\
  1 &   Reduce moa &    36.36  &  18.19 & \quart{45}{23}{45}{1} \\
  1 &   Reduce rfc &    45.45  &  18.19 & \quart{45}{23}{57}{1} \\
  1 &    Reduce ca &    45.45  &  18.19 & \quart{45}{23}{57}{1} \\
  1 &    Reduce ce &    45.45  &  18.19 & \quart{45}{23}{57}{1} \\
  1 &   Reduce npm &    45.45  &  18.19 & \quart{45}{23}{57}{1} \\
  1 &   Reduce loc &    45.45  &  9.09 & \quart{45}{12}{57}{1} \\
  1 &   Reduce amc &    45.45  &  27.28 & \quart{45}{34}{57}{1} \\
  1 & Reduce avg_cc &    45.45  &  18.19 & \quart{45}{23}{57}{1} \\
\hline  2 &   Reduce dit &    36.36  &  36.37 & \quart{22}{46}{45}{1} \\
  2 & Reduce lcom3 &    36.36  &  18.19 & \quart{45}{23}{45}{1} \\
  2 &   Reduce cbm &    36.36  &  18.19 & \quart{45}{23}{45}{1} \\
  2 &         RANK &    36.36  &  0.0 & \quart{45}{0}{45}{1} \\
\hline \end{tabular}}


{\scriptsize \begin{tabular}{l@{~~~}l@{~~~}r@{~~~}r@{~~~}c}
\textbf{Rank} & \textbf{Treatment} & \textbf{Median} & \textbf{IQR} & \\\hline
  1 &         RANK &    14.78  &  4.92 & \quart{57}{22}{66}{4} \\
  1 & Reduce lcom3 &    15.76  &  1.96 & \quart{68}{9}{71}{4} \\
  1 &   Reduce moa &    15.76  &  2.45 & \quart{66}{11}{71}{4} \\
  1 &   Reduce cbo &    16.26  &  1.97 & \quart{71}{8}{73}{4} \\
  1 &   Reduce npm &    16.26  &  2.46 & \quart{68}{11}{73}{4} \\
  1 &   Reduce loc &    16.75  &  2.46 & \quart{68}{11}{75}{4} \\
\hline \end{tabular}}
timm commented 8 years ago

re harman's threshold technique

rahlk commented 8 years ago

re harman's threshold technique

There are 2 references.

  title={Detecting and refactoring code smells in spreadsheet formulas},
  author={Hermans, Felienne and Pinzger, Martin and van Deursen, Arie},
  journal={Empirical Software Engineering},
author = {Alves, Tiago L. and Ypma, Christiaan and Visser, Joost},
booktitle = {2010 IEEE Int. Conf. Softw. Maint.},
doi = {10.1109/ICSM.2010.5609747},
benchmark data - 2010.pdf:pdf},
isbn = {978-1-4244-8630-4},
issn = {10636773},
mendeley-groups = {OO Metric Thresholds},
month = {sep},
pages = {1--10},
publisher = {IEEE},
title = {{Deriving metric thresholds from benchmark data}},
url = {},
year = {2010}

They seem to use a benchmark data set to derive a set of common thresholds. Since, we don't have that, we can derive thresholds separately for every data set. The technique is straightforward.

rahlk commented 8 years ago

Hermans thresholds


In addition to using VARL to identify thresholds as proposed by Shatnawi. We another alternative method proposed by Alves et al~\cite{alves10}. This method is unique in that respects the underlying statistical distribution and scale of the metrics. It works as follows.

Evey metric value is weighted according to the source lines of code (LOC) of the class. All the weighted metrics are then normalized i.e., they are divided by the sum of all weights of the same system. Following this, the normalized metric values are ordered in an ascending fashion. This is equivalent to computing a density function, in which the x-axis represents the weight ratio (0-100%), and the y-axis the metric scale.

Thresholds are then derived by choosing the percentage of the overall code that needs to be represented. For instance, Alves et al suggest the use 90% quantile of the overall code to derive the threshold for a specific metric. This threshold is meaningful since it can be used to identify 10% of the worst code with respect to a specific metric. And thresholds greater than 90\% represent very-high risk.


{\scriptsize \begin{tabular}{l@{~~~}l@{~~~}r@{~~~}r@{~~~}c}
\textbf{Rank} & \textbf{Treatment} & \textbf{Median} & \textbf{IQR} & \\\hline
  1 &         RANK &    63.25  &  24.1 & \quart{53}{26}{70}{1} \\
\hline  2 &   Reduce wmc &    22.29  &  6.63 & \quart{19}{7}{24}{1} \\
  2 & Reduce max_cc &    21.69  &  7.23 & \quart{18}{8}{24}{1} \\
  2 &   Reduce loc &    21.69  &  4.82 & \quart{20}{6}{24}{1} \\
  2 &  Reduce lcom &    21.69  &  4.82 & \quart{22}{6}{24}{1} \\
  2 &   Reduce cbo &    21.69  &  4.82 & \quart{21}{5}{24}{1} \\
  2 &    Reduce ic &    21.69  &  5.43 & \quart{20}{6}{24}{1} \\
  2 &   Reduce cbm &    21.08  &  5.43 & \quart{20}{6}{23}{1} \\
  2 &   Reduce dam &    21.08  &  6.02 & \quart{21}{7}{23}{1} \\
  2 &   Reduce npm &    21.08  &  5.43 & \quart{20}{6}{23}{1} \\
  2 &   Reduce rfc &    21.08  &  3.61 & \quart{21}{4}{23}{1} \\
  2 &   Reduce cam &    21.08  &  4.22 & \quart{20}{5}{23}{1} \\
  2 &   Reduce moa &    19.88  &  5.42 & \quart{20}{6}{22}{1} \\
  2 &    Reduce ce &    20.48  &  4.21 & \quart{21}{5}{22}{1} \\
  2 & Reduce avg_cc &    19.88  &  7.23 & \quart{19}{8}{22}{1} \\
\hline \end{tabular}}


{\scriptsize \begin{tabular}{l@{~~~}l@{~~~}r@{~~~}r@{~~~}c}
\textbf{Rank} & \textbf{Treatment} & \textbf{Median} & \textbf{IQR} & \\\hline
  1 &   Reduce noc &    30.0  &  15.0 & \quart{31}{20}{38}{1} \\
  1 &   Reduce amc &    30.0  &  12.5 & \quart{31}{16}{38}{1} \\
  1 &    Reduce ce &    30.0  &  12.5 & \quart{35}{16}{38}{1} \\
  1 &  Reduce lcom &    32.5  &  10.0 & \quart{35}{12}{41}{1} \\
  1 &   Reduce loc &    32.5  &  12.5 & \quart{35}{16}{41}{1} \\
  1 &   Reduce wmc &    32.5  &  17.5 & \quart{31}{23}{41}{1} \\
  1 &   Reduce cbo &    35.0  &  12.5 & \quart{35}{16}{44}{1} \\
  1 &   Reduce rfc &    35.0  &  12.5 & \quart{35}{16}{44}{1} \\
  1 &   Reduce npm &    35.0  &  7.5 & \quart{38}{9}{44}{1} \\
  1 &   Reduce cam &    35.0  &  15.0 & \quart{38}{19}{44}{1} \\
  1 & Reduce max_cc &    35.0  &  12.5 & \quart{35}{16}{44}{1} \\
  1 & Reduce avg_cc &    35.0  &  15.0 & \quart{35}{19}{44}{1} \\
  1 &   Reduce cbm &    40.0  &  17.5 & \quart{38}{22}{51}{1} \\
\hline  2 &         RANK &    52.5  &  20.0 & \quart{54}{25}{67}{1} \\
\hline \end{tabular}}


{\scriptsize \begin{tabular}{l@{~~~}l@{~~~}r@{~~~}r@{~~~}c}
\textbf{Rank} & \textbf{Treatment} & \textbf{Median} & \textbf{IQR} & \\\hline
  1 &   Reduce wmc &    36.36  &  0.0 & \quart{79}{0}{79}{2} \\
  1 &   Reduce dit &    36.36  &  0.0 & \quart{79}{0}{79}{2} \\
  1 &   Reduce cbo &    36.36  &  0.0 & \quart{79}{0}{79}{2} \\
  1 &   Reduce rfc &    36.36  &  0.0 & \quart{79}{0}{79}{2} \\
  1 &  Reduce lcom &    36.36  &  36.36 & \quart{0}{79}{79}{2} \\
  1 &    Reduce ca &    36.36  &  0.0 & \quart{79}{0}{79}{2} \\
  1 &    Reduce ce &    36.36  &  0.0 & \quart{79}{0}{79}{2} \\
  1 &   Reduce npm &    36.36  &  0.0 & \quart{79}{0}{79}{2} \\
  1 & Reduce lcom3 &    36.36  &  0.0 & \quart{79}{0}{79}{2} \\
  1 &   Reduce loc &    36.36  &  0.0 & \quart{79}{0}{79}{2} \\
  1 &   Reduce dam &    36.36  &  36.36 & \quart{0}{79}{79}{2} \\
  1 &   Reduce moa &    36.36  &  0.0 & \quart{79}{0}{79}{2} \\
  1 &   Reduce cam &    36.36  &  0.0 & \quart{79}{0}{79}{2} \\
  1 &    Reduce ic &    36.36  &  0.0 & \quart{79}{0}{79}{2} \\
  1 &   Reduce cbm &    36.36  &  0.0 & \quart{79}{0}{79}{2} \\
  1 &   Reduce amc &    36.36  &  0.0 & \quart{79}{0}{79}{2} \\
  1 & Reduce max_cc &    36.36  &  0.0 & \quart{79}{0}{79}{2} \\
  1 & Reduce avg_cc &    36.36  &  0.0 & \quart{79}{0}{79}{2} \\
  1 &         RANK &    36.36  &  0.0 & \quart{79}{0}{79}{2} \\
\hline \end{tabular}}


{\scriptsize \begin{tabular}{l@{~~~}l@{~~~}r@{~~~}r@{~~~}c}
\textbf{Rank} & \textbf{Treatment} & \textbf{Median} & \textbf{IQR} & \\\hline
  1 &  Reduce lcom &    14.78  &  2.46 & \quart{51}{9}{57}{3} \\
  1 &   Reduce dam &    14.78  &  1.97 & \quart{55}{7}{57}{3} \\
  1 &   Reduce npm &    15.27  &  2.96 & \quart{53}{11}{59}{3} \\
  1 &   Reduce cam &    15.27  &  2.46 & \quart{55}{9}{59}{3} \\
  1 &   Reduce rfc &    15.76  &  1.48 & \quart{57}{5}{60}{3} \\
  1 & Reduce lcom3 &    15.76  &  1.97 & \quart{57}{7}{60}{3} \\
  1 &   Reduce loc &    15.76  &  2.96 & \quart{53}{11}{60}{3} \\
\hline  2 &   Reduce cbo &    15.76  &  2.94 & \quart{55}{11}{60}{3} \\
  2 &   Reduce cbm &    15.76  &  2.45 & \quart{57}{9}{60}{3} \\
  2 &   Reduce wmc &    16.26  &  2.94 & \quart{55}{11}{62}{3} \\
  2 &    Reduce ce &    16.26  &  2.45 & \quart{57}{9}{62}{3} \\
  2 &   Reduce amc &    16.26  &  2.46 & \quart{55}{9}{62}{3} \\
  2 &   Reduce moa &    16.26  &  1.96 & \quart{59}{7}{62}{3} \\
  2 &         RANK &    16.75  &  7.88 & \quart{49}{30}{64}{3} \\
\hline \end{tabular}}


{\scriptsize \begin{tabular}{l@{~~~}l@{~~~}r@{~~~}r@{~~~}c}
\textbf{Rank} & \textbf{Treatment} & \textbf{Median} & \textbf{IQR} & \\\hline
  1 &  Reduce lcom &    9.61  &  2.86 & \quart{25}{9}{28}{2} \\
  1 &   Reduce npm &    9.61  &  4.27 & \quart{23}{13}{28}{2} \\
  1 & Reduce lcom3 &    9.61  &  2.15 & \quart{25}{7}{28}{2} \\
  1 &    Reduce ic &    9.96  &  3.2 & \quart{26}{10}{29}{2} \\
  1 &   Reduce amc &    9.96  &  1.78 & \quart{26}{6}{29}{2} \\
  1 &    Reduce ce &    9.96  &  2.86 & \quart{25}{9}{29}{2} \\
  1 &   Reduce rfc &    10.32  &  2.84 & \quart{26}{9}{30}{2} \\
  1 &   Reduce moa &    10.32  &  2.13 & \quart{26}{7}{30}{2} \\
  1 &   Reduce mfa &    10.32  &  3.21 & \quart{25}{10}{30}{2} \\
  1 &   Reduce wmc &    10.32  &  2.13 & \quart{28}{7}{30}{2} \\
\hline  2 &   Reduce dit &    10.68  &  2.13 & \quart{28}{7}{32}{2} \\
  2 &   Reduce cam &    10.68  &  3.21 & \quart{25}{10}{32}{2} \\
  2 & Reduce max_cc &    10.32  &  3.2 & \quart{26}{10}{30}{2} \\
  2 &   Reduce loc &    11.03  &  3.2 & \quart{28}{10}{33}{2} \\
  2 &   Reduce cbm &    11.39  &  2.14 & \quart{29}{7}{34}{2} \\
\hline  3 &         RANK &    20.64  &  8.9 & \quart{53}{26}{61}{2} \\
\hline \end{tabular}}
rahlk commented 8 years ago

Summary Shatnawi10

In our work, we have coded fault-free classes as zero, and faulty classes as one. We could leverage this binary nature to apply a Univariate Binary Logistic Regression (UBR) to identify metrics that have a significant association with the occurrence of defects. To set a cut-off for this association, we use a confidence interval of 95\%.

To identify thresholds for the metrics that we significant, we use a method called Value of Acceptable Risk Level (VARL) first proposed by Bender~\cite{bender99} in identifying thresholds in epidemiology studies. In his TSE 2010 article, Shatnawi~\cite{shatnawi10} endorsed the use of this method in identifying thresholds in object-oriented metrics for open source software systems.

The VARL method measures cut-off values in metrics such that, below that threshold, the probability of occurrence of defect is less than a probability $p_0$. To do this, we fit a Univariate Binary Logistic Regression (UBR) to the metrics. For every significant metric, this generates a general logistic regression model with a constant intercept ($\alpha$) and a coefficient for maximizing log-likelihood function ($\beta$). With these, the VARL is measure as follows:

\begin{equation} VARL = \frac{1}{\beta }\left( {\log \left( {\frac{{{p_0}}}{{1 - {p_0}}}} \right) - \alpha } \right) \end{equation}

Summary Alves10

In addition to using VARL to identify thresholds as proposed by Shatnawi. We another alternative method proposed by Alves et al~\cite{alves10}. This method is unique in that respects the underlying statistical distribution and scale of the metrics. It works as follows.

Evey metric value is weighted according to the source lines of code (LOC) of the class. All the weighted metrics are then normalized i.e., they are divided by the sum of all weights of the same system. Following this, the normalized metric values are ordered in an ascending fashion. This is equivalent to computing a density function, in which the x-axis represents the weight ratio (0-100%), and the y-axis the metric scale.

Thresholds are then derived by choosing the percentage of the overall code that needs to be represented. For instance, Alves et al suggest the use 90% quantile of the overall code to derive the threshold for a specific metric. This threshold is meaningful since it can be used to identify 10% of the worst code with respect to a specific metric. And thresholds greater than 90\% represent a very high risk.

Deprecated Method

One of the first methods of finding thresholds was proposed Erni and Lewerentz~\cite{erni96}. Their technique to identify thresholds was based on the data distribution, specifically the mean and the standard deviation of the metric values. They propose the use of values that lie beyond one standard deviation from the mean as a threshold. The minimum value $T{min}$ is given by $T{min}=\mu-\sigma$, and this is used when metric definition considers very small values as an indicator of problems. Otherwise, $T_{max}=\mu+\sigma$ is used, when large metric values are considered problematic.

Several researchers~\cite{shatnawi10}~cite{alves10} have pointed out that this method is subject to a few problems. Firstly, it doesn't consider the fault-proneness of classes when the thresholds are computed. Secondly, there is a lack of empirical validation of this methodology, which impedes reasonable comparisons.