GumTreeDiff / gumtree

An awesome code differencing tool
https://github.com/GumTreeDiff/gumtree/wiki
GNU Lesser General Public License v3.0
927 stars 173 forks source link

cannot get right disTC after applying the generated actions #104

Closed t94126 closed 5 years ago

t94126 commented 6 years ago

Hi, I use gumtree to diff two python files: file1.py:

tf.scalar_summary(NCE loss, loss)

file1.py:

tf.summary.scalar(NCE loss, loss)

First, I use the code in wiki to get the actions:

Run.initGenerators();
//get the tree context and node
TreeContext srcTC = Generators.getInstance().getTree(file1);
ITree srcRoot = srcTC.getRoot();
TreeContext dstTC = Generators.getInstance().getTree(file2);
ITree dstRoot = dstTC.getRoot();
// retrieve the default matcher
Matcher m = Matchers.getInstance().getMatcher(srcRoot, dstRoot); 
m.match();
MappingStore mappingStore = m.getMappings();
ActionGenerator g = new ActionGenerator(srcRoot, dstRoot, mappingStore);
g.generate();

Then I print the srcTC

(() (-1984916852 "Module" "" ((1 35)) (
    (2174485 "Expr" "" ((1 35)) (
        (2092670 "Call" "" ((1 35)) (
            (-650353278 "AttributeLoad" "" ((1 18)) (
                (1904990769 "NameLoad" "tf" ((1 0)) ()
                (3004913 "attr" "scalar_summary" ((1 18)) ())
            (83473 "Str" "NCE loss" ((19 12)) ()
            (1904990769 "NameLoad" "loss" ((31 5)) ()))))

and the dstTC

 (-1984916852 "Module" "" ((1 35)) (
    (2174485 "Expr" "" ((1 35)) (
        (2092670 "Call" "" ((1 35)) (
            (-650353278 "AttributeLoad" "" ((1 18)) (
                (-650353278 "AttributeLoad" "" ((1 0)) (
                    (1904990769 "NameLoad" "tf" ((1 0)) ()
                    (3004913 "attr" "summary" ((1 0)) ())
                (3004913 "attr" "scalar" ((1 18)) ())
            (83473 "Str" "NCE loss" ((19 12)) ()
            (1904990769 "NameLoad" "loss" ((31 5)) ()))))

Then I apply the actions to srcTC:

System.out.println(ActionUtil.apply(srcTC, actions));

In my opinion, the result should be same as dstTrc, since it is the result by applying generated action to srcTC. However, following is what I get:

(() (-1984916852 "Module" "" ((1 35)) (
    (2174485 "Expr" "" ((1 35)) (
        (2092670 "Call" "" ((1 35)) (
            (-650353278 "AttributeLoad" "" ((1 18)) (
                (-650353278 "AttributeLoad" "" ((1 0)) (
                    (1904990769 "NameLoad" "tf" ((1 0)) ()
                    (3004913 "attr" "summary" ((1 18)) ()
                    (1904990769 "NameLoad" "tf" ((1 0)) ()
                    (3004913 "attr" "summary" ((1 0)) ())
                (3004913 "attr" "scalar" ((1 18)) ()
                (-650353278 "AttributeLoad" "" ((1 0)) (
                    (1904990769 "NameLoad" "tf" ((1 0)) ()
                    (3004913 "attr" "summary" ((1 18)) ()
                    (1904990769 "NameLoad" "tf" ((1 0)) ()
                    (3004913 "attr" "summary" ((1 0)) ())
                (3004913 "attr" "scalar" ((1 18)) ())
            (83473 "Str" "NCE loss" ((19 12)) ()
            (1904990769 "NameLoad" "loss" ((31 5)) ()))))

Do I misuse some function or is it a bug in project? Thanks.

rbavishi commented 6 years ago

Hi there. I'm not a contributor, but a look at the source reveals that currently, the actual AST operations are being applied using the actions directly, rather than action trees, which accumulate operations in a subtree. For example, if a node is inserted, its children are also included as insertions in the actions. But when applying the actions, you should only insert that root node (as it already contains the children). This should be the reason behind the duplicate nodes you see in the output. (core/src/main/java/com/github/gumtreediff/actions/ActionUtil.java)

public class ActionUtil {
    private ActionUtil() {}

    public static TreeContext apply(TreeContext context, List<Action> actions) {
        for (Action a: actions) {
            if (a instanceof Insert) {
                Insert action = ((Insert) a);
                action.getParent().insertChild(action.getNode(), action.getPosition());
            } else if (a instanceof Update) {
                Update action = ((Update) a);
                action.getNode().setLabel(action.getValue());
            } else if (a instanceof Move) {
                Move action = ((Move) a);
                action.getNode().getParent().getChildren().remove(action.getNode());
                action.getParent().insertChild(action.getNode(), action.getPosition());
            } else if (a instanceof Delete) {
                Delete action = ((Delete) a);
                action.getNode().getParent().getChildren().remove(action.getNode());
            } else throw new RuntimeException("No such action: " + a );
        }
        return context;
    }
}

An example of this being done correctly is in the web differ (client.diff/src/main/java/com/github/gumtreediff/client/diff/web/HtmlDiffs.java)

public void produce() throws IOException {
        TreeClassifier c = new RootAndLeavesClassifier(src, dst, matcher);
        TIntIntMap mappingIds = new TIntIntHashMap();

        int uId = 1;
        int mId = 1;

        TagIndex ltags = new TagIndex();
        for (ITree t: src.getRoot().getTrees()) {
            if (c.getSrcMvTrees().contains(t)) {
                mappingIds.put(mappings.getDst(t).getId(), mId);
                ltags.addStartTag(t.getPos(), String.format(ID_SPAN, uId++));
                ltags.addTags(t.getPos(), String.format(
                                SRC_MV_SPAN, "token mv", mId++, tooltip(src, t)), t.getEndPos(), END_SPAN);
            }
            if (c.getSrcUpdTrees().contains(t)) {
                mappingIds.put(mappings.getDst(t).getId(), mId);
                ltags.addStartTag(t.getPos(), String.format(ID_SPAN, uId++));
                ltags.addTags(t.getPos(), String.format(
                                SRC_MV_SPAN, "token upd", mId++, tooltip(src, t)), t.getEndPos(), END_SPAN);
                List<int[]> hunks = StringAlgorithms.hunks(t.getLabel(), mappings.getDst(t).getLabel());
                for (int[] hunk: hunks)
                    ltags.addTags(t.getPos() + hunk[0], UPD_SPAN, t.getPos() + hunk[1], END_SPAN);

            }
            if (c.getSrcDelTrees().contains(t)) {
                ltags.addStartTag(t.getPos(), String.format(ID_SPAN, uId++));
                ltags.addTags(t.getPos(), String.format(
                                ADD_DEL_SPAN, "token del", tooltip(src, t)), t.getEndPos(), END_SPAN);
            }
        }

Here, the RootsAndLeavesClassifier does the accumulation I described above. A fix in this regard should also solve #97.

t94126 commented 6 years ago

Sorry for late replying. In a short word, should the correct way be: before the insertion, check the individual actions, and ignore the child node, whose parent also in the action list. Is my understanding correct? Thanks.

jrfaller commented 5 years ago

I am closing this since this method has been removed (see #97). However there are now the new insert-tree and delete-tree actions that should help solve the problem, if someone decide to implement this feature!