Closed azhe825 closed 4 years ago
{'columba': 0.4449226845209629, 'ant': 0.7017429340176755, 'emf': 0.6380096751900484, 'GA': 0.6215936150976071, 'jruby': 0.5523692737330159, 'argouml': 0.5421930738921376, 'jEdit': 0.7789072466971455, 'jfreechart': 0.6036046580288152, 'sql': 0.7034487362195452, 'jmeter': 0.61244693299951}
{'columba': 0.4664434879642914, 'ant': 0.7181253698978958, 'emf': 0.6193306348109389, 'GA': 0.6186233748042125, 'jruby': 0.5695655287230805, 'argouml': 0.5384095139986366, 'jEdit': 0.7932504759910501, 'jfreechart': 0.6981823818200222, 'sql': 0.7179018650353893, 'jmeter': 0.6097377945898873}
SVM: {'columba': 0.8617456989049528, 'argouml': 0.8621197717178325, 'emf': 0.8293974396946062, 'jfreechart': 0.8716079140938692, 'jruby': 0.8396931669680099, 'ant': 0.8244570271896712, 'jEdit': 0.8730681646327986, 'GA': 0.7219132422417196, 'sql': 0.8207539592520756, 'jmeter': 0.7629966526968922} RF: {'columba': [0.9792823946339011], 'argouml': [0.9572460024637844], 'emf': [0.9337052028828118], 'jfreechart': [0.8893728700610724], 'jruby': [0.9057571472046687], 'ant': [0.9125312703599262], 'jEdit': [0.9212702562790102], 'GA': [0.82107406181156], 'sql': [0.9060009356110723], 'jmeter': [0.860539242366516]}
contains any of these words, ["fixme","todo","workaround","hack"] Then label as technical debt.
Total number of commits: 62275
fixme {'fp': 53, 'fn': 3731, 'tp': 340} todo {'fp': 424, 'fn': 1224, 'tp': 2847} workaround {'fp': 427, 'fn': 1175, 'tp': 2896} hack {'fp': 434, 'fn': 1083, 'tp': 2988}
Check the false positives manually (by Ken): only 8 / 434 are real false positives.
Therefore the pre-filter, {'fp': 8, 'fn': 1083, 'tp': 3414, 'recall': 75.8%, 'precision': 99.8%} where old ground-truth: {'fp': 0, 'fn': 426, 'tp': 4079, 'recall': 90.5%}
Then, use machine learning to tackle the rest 1083 false-negatives.