eclipse / xtext

Eclipse Xtext™ is a language development framework
http://www.eclipse.org/Xtext
Eclipse Public License 2.0
767 stars 321 forks source link

Performance bottleneck in QueuedBuildData when building huge number of resources #2396

Open kittaakos opened 8 years ago

kittaakos commented 8 years ago

I ran into a performance issue when I was trying to build more than 500k very simple and small resources in my workspace. My workspace contains 256 projects, each project contains 256 folders and the resources are balanced into these folders. Although I managed to perform a full clean build, I experienced some performance loss in the QueuedBuildData#queueURI(URI) method.

Although the collections in the QueuedBuildData should not contain duplicates it is currently implemented with LinkedList to comply the Queue API. To avoid duplicate resource URIs, contains is being called on these LinkedLists which is pretty expensive in my case. Actually, that is the 20% of my overall build process.

I would recommend to implement some sort of factory and inject it into the QueuedBuildData. This factory will be responsible for creating the collection instances for uris and the values of the projectNameToChangedResource field as well. By default, a LinkedList instances will be created. But, via injection, one could inject a factory that creates LinkedHashSet instances. Since the iteration order of the LinkedHashSet equals to the insertion order, it can be wrapped into an AbstractQueue and can act as a queue.

I believe this neither a bug nor a critical feature request, but it could be useful for others as well.

By the way, I am happy to contribute a patch.

kittaakos commented 8 years ago

I have the feeling what I have "spammed" above previously is false, or at least just partially true.

I did the implementation: I have created a factory, that can be injected into the QueuedBuildData and this factory was responsible for creating the collections as described spammed above. I started a profiler with both (the LinkedList and the LinkedHashSet based) implementations and I was shocked when I saw, the contains time is around 0% in both cases. So I thought, profiler lies, solution is simple. I have modified the actual QueuedBuildData implementation locally, extracted the contains logic, and measured the net access time with a Stopwatch. It turned out, the profiler was true. Access time was almost zero, since when contains was called on the collections (no matter it was a LinkedList or a LinkedHashSet) the item count was around ~1000 and contains for roughly 1000 items in a collection is really nothing. So I did some further experiments and I think I have stumbled upon a weird thing...

If there is project (with the proper Xtext nature and command) and this project contains multiple projects (with the proper Xtext nature and builder commands for these sub-projects), then each resource will be built twice. That means, if I have only three projects a, b and c and each project contains one single file, A, B and C, then a full clean build will build each file once. But if I have a parent project, with zero files but it contains the projects a, b and c, then all resources will be built twice after a full clean build. And that was the aha moment, where I understood why contains was so expensive for me; because when the parent project was built, my backing collection contained more than 500k element and that is indeed not for free in case of the LinkedList.

Here come the logs from Xtext console in case of both project layouts:

[Worker-4] Building a
[Worker-4] indexing platform:/resource/a/A.metadata
[Worker-4] Build a in 6 ms
[Worker-4] Building b
[Worker-4] indexing platform:/resource/b/B.metadata
[Worker-4] Build b in 4 ms
[Worker-4] Building c
[Worker-4] indexing platform:/resource/c/C.metadata
[Worker-4] Build c in 5 ms
[Worker-12] Building a
[Worker-12] indexing platform:/resource/a/A.metadata
[Worker-12] Build a in 3 ms
[Worker-12] Building b
[Worker-12] indexing platform:/resource/b/B.metadata
[Worker-12] Build b in 3 ms
[Worker-12] Building c
[Worker-12] indexing platform:/resource/c/C.metadata
[Worker-12] Build c in 2 ms
[Worker-12] Building parent
[Worker-12] indexing platform:/resource/parent/c/C.metadata
[Worker-12] indexing platform:/resource/parent/b/B.metadata
[Worker-12] indexing platform:/resource/parent/a/A.metadata
[Worker-12] Build parent in 6 ms

So my questions would be the followings: is this "let's build all nested resources twice" strategy is a feature or a bug? If this is a bug, is it a known issue? If this is an issue, is this really an Xtext specific one or it is rather a generic problem with the Eclipse builder?

Here, I have to note, I have modified the org.eclipse.xtext.ui.containers.WorkspaceProjectsStateHelper.initVisibleHandles(String), so I am not checking project references at all (org.eclipse.core.resources.IWorkspaceRoot.getProject(String)), for me everything is visible from everywhere. Could that be the cause of the problem?

I would appreciate any kinds of feedback on this thread. Thanks in advance!

meysholdt commented 8 years ago

Building resources twice is not a feature :) Thank you for your analyses and the update!

But if I have a parent project, with zero files but it contains the projects a, b and c, then all resources will be built twice after a full clean build

Does this only happen when the parent project have an Xtext nature?

According to your log, when Xtext builds a project it builds all resources from the directory tree without excluding the resources from inside the child projects. That does not sound like how things should be :)

kittaakos commented 8 years ago

First off, thanks for the quick response.

Does this only happen when the parent project have an Xtext nature?

Correct, when the parent project does not have the Xtext nature, each resource is being built once.

Is there a way I can help? I have already checked out everything, and maybe I can look into that. In the worst case, if I get stuck, you, professionals have to take care of it but at least I can provide failing test cases or something like that.

meysholdt commented 8 years ago

You're welcome and yes, help is greatly appreciated.

Since a builder is invoked per project, I think it's conceptually wrong for a builder to traverse into nested projects.

A fix could be to change org.eclipse.xtext.builder.impl.ToBeBuiltComputer to never collect resources from nested projects, e.g. only collect resources for which currentlyBuiltProject == nestedResource.getProject() is true.

kittaakos commented 8 years ago

Looked into the issue, the fix seems to be quite straightforward. I would apply my changes here: org.eclipse.xtext.builder.impl.ToBeBuiltComputer.isHandled(IFolder).

Before calling the IToBeBuiltComputerContribution whether the folder is rejected or not, I would check if the folder is a nested project or not. If the folder is a nested project, no contribution should be called at all, and the method should return with false indicating that the subtree should not be visited at all.

My logic to check if a folder is a nested project would be the following:

The first prototype seems to be fine according to the Xtext console, but I need to write tests.

Please let me know if you disagree with my approach, then I can adjust it.

JanKoehnlein commented 8 years ago

But isn't the error the Xtext nature on the parent project, which should not contain any Xtext resources at all?

Adding the files from a contained project via the build of the parent project could result in wrong paths for the index entries. So instead of skipping nested projects, I'd rather skip the parent.

Nested projects in Eclipse have been added late, so I am not quite sure whether it is always clear what they mean. Can you nest Java projects within a Java project in JDT? What does it mean?

kittaakos commented 8 years ago

Good point, I have checked it. These are my findings:

nestedjavaprojects

Any ideas how to proceed? I mean, of course I can workaround my original issue by simply removing the Xtext nature from the parent project. I just thought, it could be cool to support it, so one can create a project with a couple of Xtext resources and then nest another Xtext project with other resources into it... this project nesting became trendy recently, right? (just joking)

meysholdt commented 8 years ago

this project nesting became trendy recently, right? (just joking)

I agree :) Much better than Eclipse Working Sets :)

let's proceed by verifying wether a parent project's builder also receives resource change deltas for child events. I mean the deltas returned by org.eclipse.core.resources.IncrementalProjectBuilder.getDelta(IProject) in org.eclipse.xtext.builder.impl.XtextBuilder.build(int, Map, IProgressMonitor).

If my assumption is right and a parent project does not receive resource deltas for nested projects we can continue with the approach of not including nested project's resources in ToBeBuiltComputer.

I spoke with @JanKoehnlein and we agreed that looking at JDT is not very helpful because JDT only builds resources form the src folder and one would usually not put a nested project inside a source folder.

kittaakos commented 8 years ago

I had only a couple of minutes to look into that, but based on my quick debugging session, I saw that both parent and (nested) sub-project receives a resource change delta in the Xtext builder when modifying a resource (IFile) in the sub-project. So probably the initial idea will not work, but as I said, I need to make a bit deeper investigation. I will come back to this ticket in a few days.

kthoms commented 8 years ago

How about configuring a resource filter on the parent project to exclude the child project? I usually do this to avoid having double resources when importing the child project. This can be configured in the project properties. However, this is no general approach supported by the framework, but rather something to be done when having parent/child projects together in a workspace.

kthoms commented 8 years ago

See also bug#488956 to configure these filters on project creation by the wizard.

kittaakos commented 8 years ago

I can confirm, that both the parent and the child projects are being rebuilt when modifying a single resource in the child project. Please reference the failing test case below.

/*******************************************************************************
 * Copyright (c) 2016 itemis AG (http://www.itemis.eu) and others.
 * All rights reserved. This program and the accompanying materials
 * are made available under the terms of the Eclipse Public License v1.0
 * which accompanies this distribution, and is available at
 * http://www.eclipse.org/legal/epl-v10.html
 *******************************************************************************/
package org.eclipse.xtext.builder.impl
import java.nio.file.Paths
import org.eclipse.core.internal.resources.ProjectDescription
import org.eclipse.xtext.ui.XtextProjectHelper
import org.junit.Test
import static org.eclipse.core.runtime.IPath.SEPARATOR
import static org.eclipse.xtext.junit4.ui.util.IResourcesSetupUtil.*
import static org.eclipse.xtext.builder.impl.BuilderUtil.*
import org.junit.After
import org.eclipse.xtext.util.StringInputStream
import org.eclipse.emf.common.util.URI
/**
 * Test for checking whether ancestor projects are being re-built when modifying any resources in nested sub-projects.
 */
class BuildNestedProjectsTest extends AbstractBuilderParticipantTest {

    static val PARENT_PROJECT_NAME = 'parentProject';
    static val CHILD_PROJECT_NAME = 'childProject';
    @After
    def void removedListeners() {
        builderState.removeListener(this);
    }
    @Test
    def void testModifyNestedResource() {
        // Setup parent project.
        val parentProject = createProject(PARENT_PROJECT_NAME);
        addNature(parentProject, XtextProjectHelper.NATURE_ID);
        addBuilder(parentProject, XtextProjectHelper.BUILDER_ID);
        val parentResource = createFile('''«PARENT_PROJECT_NAME»«SEPARATOR»parentResource.buildertestlanguage''', 'someContent');
        assertTrue('''Cannot access «PARENT_PROJECT_NAME».''', parentProject.accessible);
        assertTrue('Cannot access parentResource.', parentResource.accessible);
        // Setup child project inside parent project.
        val childProject = root.getProject(CHILD_PROJECT_NAME);
        childProject.create(new ProjectDescription() => [
            location = parentProject.location.append(CHILD_PROJECT_NAME);
            name = CHILD_PROJECT_NAME;
        ], null);
        childProject.open(null);
        addNature(childProject, XtextProjectHelper.NATURE_ID);
        addBuilder(childProject, XtextProjectHelper.BUILDER_ID);
        val childResource = createFile('''«CHILD_PROJECT_NAME»«SEPARATOR»childResource.buildertestlanguage''', 'someContent');
        assertTrue('Cannot access childProject.', childProject.accessible);
        assertTrue('Cannot access parentResource.', childResource.accessible);
        // Make sure both projects are visible from the Eclipse workspace.
        val projectNames = root.projects.map[name].toSet;
        assertTrue('''Project «PARENT_PROJECT_NAME» does not exist in the workspace.''',
            projectNames.contains(parentProject.name));
        assertTrue('''Project «CHILD_PROJECT_NAME» does not exist in the workspace.''',
            projectNames.contains(childProject.name));
        // Make sure child project is physically contained in parent project.
        val parentProjectPath = Paths.get(parentProject.locationURI);
        val childProjectPath = Paths.get(childProject.locationURI);
        assertEquals(
            '''«CHILD_PROJECT_NAME» project is not contained in «PARENT_PROJECT_NAME» project.''',
            parentProjectPath.resolve(CHILD_PROJECT_NAME), childProjectPath);
        waitForBuild();
        builderState.addListener(this);

        // Modify parent resource.
        parentResource.setContents(new StringInputStream('newContent'), true, true, null);
        waitForBuild();
        // Assert that only parent resource have been built.
        assertEquals(
            '''Expected exactly one re-built resource. Were: «events.map[deltas].flatten.map[uri].join(',\n')»''',
            1,
            events.map[deltas].flatten.size);
        assertEquals('Unexpected resource has been changed during the build.',
            URI.createPlatformResourceURI('''«PARENT_PROJECT_NAME»«SEPARATOR»«parentResource.name»''', true), 
            events.head.deltas.head.uri
        );
        // Clean resource description events.
        events.clear();
        assertEquals('Expected zero resource description events.', 0, events.size);
        // Modify child resource.
        childResource.setContents(new StringInputStream('newContent'), true, true, null);
        waitForBuild();
        // Assert that only child resource have been built.
        assertEquals(
            '''Expected exactly one re-built resource. Were: «events.map[deltas].flatten.map[uri].join(',\n')»''',
            1,
            events.map[deltas].flatten.size);
    }
}
cdietrich commented 6 years ago

see see a performance bottleneck here to but we dont have any nested projects or somethink like that. but the linear search of already contained might hurt

queuedbuilddata

cdietrich commented 6 years ago

Linked List issue is Fixed in 2.14. rest is still open.