GumTreeDiff / gumtree

An awesome code differencing tool
https://github.com/GumTreeDiff/gumtree/wiki
GNU Lesser General Public License v3.0
936 stars 174 forks source link

Fix for multithreaded execution #336

Closed pouryafard75 closed 9 months ago

pouryafard75 commented 9 months ago

Hello,

I've identified a bug affecting multithreaded execution.

This PR consists of 3 commits, and I'll break down each commit:

P.S: I'm aware that the test should ideally be located in the core module, but due to the dependency on JdtTreeGenerator, I couldn't move it. I can export the tree XML and load it to eliminate the dependency. Please let me know your thoughts.

jrfaller commented 9 months ago

Hi @pouryafard75 and thanks a lot for spotting this nasty concurrency bug that prevents parallelizing diffs.

I took a look at your solution but I am not sure it's how we should proceed. Initially, I used a global type registry to have a fast equality test for types, using only a reference check instead of a string comparison, but your solution would revert to this slower equality checking (and this is a very frequent operation).

By thinking more about the root of the problem, I think the culprit here is the type constructor (https://github.com/GumTreeDiff/gumtree/blob/main/core/src/main/java/com/github/gumtreediff/tree/TypeSet.java#L38-L40) that is not thread-safe. Maybe we could try to make this method synchronized? Could you try this fix to see if it would work with your test?

Cheers!

pouryafard75 commented 9 months ago

@jrfaller Thanks for your feedback. Your suggested solution works perfectly, and I have updated the PR. Would you recommend moving the test to core and then loading the tree from xml file? Because the test doesnt have to do anything with jdt.

jrfaller commented 9 months ago

Hi @pouryafard75 and thanks for trying the fix out!

I completely agree with you that the test case should be located inside the core package, and should be as small as possible.

I think the problem appears in such a case :

So if I am not wrong, just having one-node trees (or trees with several nodes with the same type) to diff concurrently could trigger the bug, the assert predicate would be that all combination of types should return true to ==.

WDYT?

pouryafard75 commented 9 months ago

@jrfaller Thanks for your comment. Its a bit challenging to make the test fail in case of having super small trees. Also we cannot create the Type in the setup, because that will ensure the type has been added to registry properly before any tests.

This is what I have replicated

@RepeatedTest(10)
    public void testMultiThreading() {
        Type foo1 = TypeSet.type("foo");
        Type foo2 = TypeSet.type("foo");
        assertSame(foo1,foo2);
    }

I also moved it to TestTree. but the problem is, it doesn't fail frequently. I assume the previous test due to the time it took to parse the tree and generate, had higher chance of failure.

jrfaller commented 9 months ago

I have this one :

    @Test
    public void testTypeThreading() throws InterruptedException {
        int n = 20;
        ExecutorService exec = Executors.newFixedThreadPool(n);
        List<Type> types = new ArrayList<>();

        for (int i = 0; i < n; i++) {
            exec.submit(() -> {
                types.add(TypeSet.type("foo"));
            });
        }

        exec.awaitTermination(1, java.util.concurrent.TimeUnit.SECONDS);

        for (Type t1 : types) {
            for (Type t2: types) {
                assertSame(t1, t2);
            }
        }
    }

Seems to fail almost every time on my machine, is it the case for you too?

pouryafard75 commented 9 months ago

@jrfaller Yes this works. (I was fixated to take advantage of repeated tests to reproduce the bug :grimacing:) I have updated the PR.