ai-se / george

stuff for george
MIT License
1 stars 0 forks source link

Slimmer Baseline Algorithm for effort estimation #21

Open bigfatnoob opened 8 years ago

bigfatnoob commented 8 years ago

Whigham et Al suggests using linear regression as a baseline but for software effort datasets the number of attributes are large in number which makes the datasets really skewed for prediction. Here, I have removed attributes(columns) with low correlation with the dependent variable such that

rows : columns >= 10 : 1
bigfatnoob commented 8 years ago

For cocomo datasets

Mystery1

rank ,                   name ,    med   ,  iqr 
----------------------------------------------------
1 ,                COCOMO2 ,   42 ,   86 (*              |              ), 0.13,  0.42,  0.99
1 ,                   CART ,   47 ,  286 (*---           |              ), 0.16,  0.47,  3.02
1 ,                COCONUT ,   76 ,  113 (-*             |              ), 0.19,  0.76,  1.32
1 ,             P_BASELINE ,   88 ,   59 (-*             |              ), 0.45,  0.88,  1.04
1 ,               BASELINE ,  102 ,  222 (-*-            |              ), 0.38,  1.02,  2.60

Mystery2

rank ,                   name ,    med   ,  iqr 
----------------------------------------------------
1 ,                COCOMO2 ,    8 ,  224 (*              |              ), 0.01,  0.08,  2.25
1 ,                   CART ,   11 ,   96 (*              |              ), 0.00,  0.11,  0.96
1 ,                COCONUT ,   11 ,  113 (*              |              ), 0.01,  0.11,  1.14
1 ,             P_BASELINE ,   94 ,  283 (*              |              ), 0.16,  0.94,  2.99
2 ,               BASELINE ,  503 , 6223 (*              |              ), 0.58,  5.03, 62.81

nasa93

rank ,                   name ,    med   ,  iqr 
----------------------------------------------------
1 ,                COCONUT ,   10 ,   88 (*              |              ), 0.00,  0.10,  0.88
1 ,                COCOMO2 ,   11 ,   88 (*              |              ), 0.00,  0.11,  0.88
2 ,                   CART ,   32 ,  140 (*              |              ), 0.02,  0.32,  1.42
2 ,             P_BASELINE ,   60 ,  128 (*              |              ), 0.27,  0.60,  1.55
2 ,               BASELINE ,   77 ,  277 (*              |              ), 0.15,  0.77,  2.92

coc81

rank ,                   name ,    med   ,  iqr 
----------------------------------------------------
1 ,                COCOMO2 ,    1 ,   37 (*              |              ), 0.00,  0.01,  0.37
1 ,                COCONUT ,    1 ,   37 (*              |              ), 0.00,  0.01,  0.37
1 ,                   CART ,   34 ,  111 (*              |              ), 0.01,  0.34,  1.12
1 ,             P_BASELINE ,  100 ,  121 (*              |              ), 0.46,  1.00,  1.67
1 ,               BASELINE ,  255 , 1591 (*              |              ), 0.57,  2.55, 16.48
bigfatnoob commented 8 years ago

For non-cocomo datasets

albrecht

rank ,                   name ,    med   ,  iqr 
----------------------------------------------------
1 ,                   TEAK ,   23 ,   59 (*              |              ), 0.07,  0.23,  0.66
1 ,                   CART ,   43 ,   69 (-*             |              ), 0.11,  0.43,  0.80
1 ,             P_BASELINE ,   55 ,  147 (-*--           |              ), 0.29,  0.55,  1.76
1 ,               BASELINE ,   61 ,  199 (-*---          |              ), 0.18,  0.61,  2.17

kitchenham

rank ,                   name ,    med   ,  iqr 
----------------------------------------------------
1 ,                   CART ,   21 ,   64 (*              |              ), 0.03,  0.21,  0.67
1 ,                   TEAK ,   37 ,  115 (*              |              ), 0.08,  0.37,  1.23
2 ,             P_BASELINE ,  180 ,  436 (*              |              ), 0.39,  1.80,  4.75
2 ,               BASELINE ,  180 ,  453 (*              |              ), 0.40,  1.80,  4.93

maxwell

rank ,                   name ,    med   ,  iqr 
----------------------------------------------------
1 ,                   CART ,   54 ,   92 (*              |              ), 0.06,  0.54,  0.98
1 ,                   TEAK ,   82 ,  110 (*              |              ), 0.22,  0.82,  1.32
1 ,             P_BASELINE ,   99 ,  176 (*              |              ), 0.31,  0.99,  2.07
2 ,               BASELINE ,  966 , 2676 (*              |              ), 2.24,  9.66, 29.00

miyazaki

rank ,                   name ,    med   ,  iqr 
----------------------------------------------------
1 ,                   TEAK ,   21 ,   93 (*              |              ), 0.05,  0.21,  0.98
1 ,                   CART ,   24 ,   81 (*              |              ), 0.08,  0.24,  0.89
1 ,               BASELINE ,   45 ,  122 (*              |              ), 0.19,  0.45,  1.41
1 ,             P_BASELINE ,   51 ,  178 (*              |              ), 0.15,  0.51,  1.93

china

rank ,                   name ,    med   ,  iqr 
----------------------------------------------------
1 ,                   CART ,    7 ,   44 (*              |              ), 0.01,  0.07,  0.45
1 ,                   TEAK ,   19 ,   86 (*              |              ), 0.02,  0.19,  0.88
2 ,             P_BASELINE ,   37 ,   72 (*              |              ), 0.12,  0.37,  0.84
2 ,               BASELINE ,   37 ,   97 (*              |              ), 0.12,  0.37,  1.09
timm commented 8 years ago

so we seem to be suggesting linear regression is NOT a good baseline but either decision trees (with CART) or something like TEAK is

BTW, you sure on the coc81 results? a median error of 1% never seen that before

bigfatnoob commented 8 years ago

Linear Regression is not a good baseline for effort estimation so a slimmer version of it by pruning out attributes that are not necessary is what i think we should propose. The correlation between the attribute and effort is estimated through spearman's correlation and only the top X ranked attributes are selected where X = rows/10 . CART and TEAK cannot be called baselines since 1) they are not easy to implement, 2) needs tuning for setting magic params

The error measure used is the one asked by the EMSE reviewers. For our standard MRE measure it is around 42

timm commented 8 years ago

right... but what do you think about the ACM TOSEM baseline paper? is that what you'd call a baseline hat everyone should use to comapre their work against?

bigfatnoob commented 8 years ago

In the above results, ACM TOSEM baseline is highlighted as BASELINE but our proposed baseline is shown as P_BASELINE. We can see that the BASELINE is way off from other methods and cannot be compared to the other methods. But our method almost is always in the same league as the other methods for both Cocomo and Non-Cocomo data. So i think, our proposed baseline is better than the that proposed by Whigham et. al in the ACM TOSEM paper and at the same time it is simple to incorporate.

timm commented 8 years ago

So i think, our proposed baseline is better than the that proposed by Whigham et. al in the ACM TOSEM paper and at the same time it is simple to incorporate.

sounds like a paper to me. first "dialog strategies". then "better baseline"