GumTreeDiff / gumtree

An awesome code differencing tool
https://github.com/GumTreeDiff/gumtree/wiki
GNU Lesser General Public License v3.0
893 stars 170 forks source link

`textdiff` has inconsistent range output between formats #334

Open JonahSussman opened 5 months ago

JonahSussman commented 5 months ago

Hi all,

I compiled the latest version of Gumtree from the source and have been using it. Really great project! I'm having some issues with the textdiff command, unfortunately. It appears that the output between the three formats is inconsistent with each other, mainly with their ranges.

(Not that I think it matters, but I'm currently running Fedora 38.)

Steps to Reproduce

I have the following Java files, A.java, B1.java, and B2.java, which I have attached to this issue:

I then ran the following (I've attached these files to my issue as well):

./gumtree textdiff A.java B1.java -f TEXT > A-B1-text.txt
./gumtree textdiff A.java B1.java -f JSON > A-B1-json.txt
./gumtree textdiff A.java B1.java -f XML  > A-B1-xml.txt
./gumtree textdiff A.java B2.java -f TEXT > A-B2-text.txt
./gumtree textdiff A.java B2.java -f JSON > A-B2-json.txt
./gumtree textdiff A.java B2.java -f XML  > A-B2-xml.txt

Behavior

All three formats handled update-node the same way, but there is an inconsistency between their handling of insert-tree. Each insert-tree has tree as an ExpressionStatement and parent as a Block. This is correct. However, the ranges differ between formats. Here's the inconsistency shown in a table:

Format A to B1 A to B2
TEXT tree: [116, 144]
parent: [111, 116]
tree: [126, 154]
parent: [111, 116]
JSON tree: [116, 144]
parent: [110, 148]
tree: [126, 154]
parent: [120, 158]
XML tree: [116, 144]
parent: [110, 148]
tree: [126, 154]
parent: [120, 158]

It appears that TEXT is taking parent from A.java, while JSON and XML are taking parent from B1/2.java. You can verify this because TEXT's parent fields do not change between the two diffs, while JSON's and XML's increase by 10.

Which file format has the correct version? Personally, I think TEXT makes the most sense, as you are inserting the tree at a point in the old tree.

Files

JonahSussman commented 5 months ago

I did some digging within the code and found this line in this file:

https://github.com/GumTreeDiff/gumtree/blob/170ba6f0cdd2f350e04d38c4af956670b46ec3df/core/src/main/java/com/github/gumtreediff/actions/ChawatheScriptGenerator.java#L106

Insert is defined as the following:

https://github.com/GumTreeDiff/gumtree/blob/170ba6f0cdd2f350e04d38c4af956670b46ec3df/core/src/main/java/com/github/gumtreediff/actions/model/Insert.java#L25-L34

From this, we can see that the parent field is supposed to reference the source tree, not the destination tree. This confirms that the expected behavior should be what TEXT is doing, and JSON and XML are somehow doing something wrong.

jrfaller commented 5 months ago

Thanks a lot for the detailed report! You're correct, it seems like a bug. I will try to investigate this.

jrfaller commented 5 months ago

OK I see the problem.

The text serializer uses this code to output the text for insert actions. It defers the computation to this method, which resort to the parent node stored in the action, which belongs to the source tree as you noticed here.

For the other serializer, XML for instance, it uses this code which uses the parent node supplied as a parameter here that is the parent coming from the destination tree which is a different behavior.

I think we should indeed have consistent behavior but I am not sure what is the right solution.

An inserted node only exists in the destination tree therefore if one wants to look up the code corresponding to this node together with the parent, she should look inside the destination file, that's why I am more convinced by the behavior of the XML and JSON formatters. WDYT?