GumTreeDiff / gumtree

An awesome code differencing tool
https://github.com/GumTreeDiff/gumtree/wiki
GNU Lesser General Public License v3.0
893 stars 170 forks source link

redundant edit script about insert and delete operations on unmodifie code #299

Closed geyu00 closed 1 year ago

geyu00 commented 1 year ago

Hi! I'm using Gumtree(v3.0.0) on macOS to analyze Java codes. When I used it to obtain edit script for an example(repo:maven https://github.com/apache/maven, commit id: f684761dee739b4ec8a7e6db5a0a6a0b809e66c9, file:maven-model-builder/src/main/java/org/apache/maven/model/inheritance/DefaultInheritanceAssembler.java), I found that there were redundant edit scripts about insert and delete operations on unmodified code. Below is related files, which is about the insertion of a new boolexpression.

example.zip

But the results contained some insertion and deletion about the unmodified codes.

image image

And when I removed some codes, these redundant edit scripts disappeared.

tzeH commented 1 year ago

Intersting, I see very similar behaviour in one of my own examples:

===
insert-node
---
Modifier: public [1332,1338]
to
TypeDeclaration [1332,10026]
at 0
===
insert-node
---
TYPE_DECLARATION_KIND: class [1339,1344]
to
TypeDeclaration [1332,10026]
at 1
===
insert-node
---
SimpleName: MonatsBerichtService [1345,1365]
to
TypeDeclaration [1332,10026]
at 2
===
insert-tree
---
ExpressionStatement [8304,8354]
    MethodInvocation [8304,8353]
        METHOD_INVOCATION_RECEIVER [8304,8327]
            SimpleName: dvIdCategoryProjectList [8304,8327]
        SimpleName: add [8328,8331]
        METHOD_INVOCATION_ARGUMENTS [8332,8352]
            SimpleName: categoryProjectEntry [8332,8352]
to
Block [7919,8300]
at 2
===
delete-node
---
Modifier: public [1332,1338]
===
===
delete-node
---
TYPE_DECLARATION_KIND: class [1339,1344]
===
===
delete-node
---
SimpleName: MonatsBerichtService [1345,1365]
===

And it doesn't seem to happen with the 2.1.2-Version:

Insert ExpressionStatement(761) into Block(762) at 2
Insert MethodInvocation(760) into ExpressionStatement(761) at 0
Insert SimpleName: dvIdCategoryProjectList(757) into MethodInvocation(760) at 0
Insert SimpleName: add(758) into MethodInvocation(760) at 1
Insert SimpleName: categoryProjectEntry(759) into MethodInvocation(760) at 2
tzeH commented 1 year ago

I stepped through the 2.1.2 and 3.0.0 in a debugger in order to find where the difference comes from and ended up in AbstractBottomUpMatcher.lastChanceMatch. There is a condition in both of them:

cSrc.getSize() < AbstractBottomUpMatcher.SIZE_THRESHOLD || cDst.getSize() < AbstractBottomUpMatcher.SIZE_THRESHOLD in 2.1.2 and src.getMetrics().size < sizeThreshold || dst.getMetrics().size < sizeThreshold in 3.0.0.

However - src.getMetrics().size evaluates to 1042 in my case and cSrc.getSize() evaluates to 3 in 2.1.2.

Another difference: 2.1.2 contains removeMatched(cSrc, true); which is missing from 3.0.0.

Does this analysis ring a bell @jrfaller ?

I could get back to the old behaviour by increasing the threshold property to 10.000:

        GumtreeProperties properties = new GumtreeProperties();
        properties.tryConfigure(ConfigurationOptions.bu_minsize, 10000);
        defaultMatcher.configure(properties);
jrfaller commented 1 year ago

Hi @tzeH !

For the metric difference, it's normal since in 2.x already mapped trees were pruned but this is no longer the case in 3.x.

The problem you describe is easily fixed using the "new" simple matcher. (-m gumtree-simple) that no longer use Zhang and Shasha algorithm.