Thomsch / untangling-tools-benchmark

Framework to compare commit-untangling tools fairly and accurately.
1 stars 0 forks source link

Update line_count.sh to support tangled lines #36

Open Thomsch opened 1 year ago

Thomsch commented 1 year ago

Update line_count.sh to support tangled lines instead of counting them as non-bug-fixing lines.

The support can be added by adding a new column to count the number of tangled lines.

thanhdang2712 commented 1 year ago

@Thomsch Considering the new metric for tangled line support is the number of tangled lines, we have this data in metrics.csv. In addition, tangled lines may not exist in the truth.csv file.

Currently, the formula I have for the number of tangled lines is: [changed_lines(BF_diff) + changed_lines(NBF_diff) - changed_lines(VC_diff)] // 2

One way is to modify count_lines.py such that it grabs the data from metrics.csv; the other way is to implement this in count_lines.py itself. I imagined that tangled lines would have the 'both' label, but I have never seen an example of a bug file containing this 'both' labelled line.

What are your thoughts on this? Do you think this feature is needed in lines_count.sh, or it could just be considered a commit metric after all?