Improve multi-project support in a single workspace

tortmayr commented 1 year ago

For many C/C++ based Editors it is a common practice to work on several projects within the same workspace simultaneously (e.g. Eclipse CDT). Unfortunately this usecase is currently not well supported when using (vscode-)clangd.

Clangd has no explicit support for mulit-root workspaces. It is possible to provide baseline multi-project support by setting up project-specific compilation databases, since clangd searches and uses the individual compile_commands.json file in the source directory path to determine the CBD. However, this approach has several drawbacks: Indexing problems occurs when functions with same signature has been defined in separate projects (see this Issue #38) The indexing problems also result in incorrect/unexpected results when using LSP features like go to definition

https://github.com/clangd/clangd/issues/1258
https://github.com/clangd/clangd/issues/1400
Determining the CDB for external header files comes into troubles when there are several project roots. (see discussion in this Issue #907)

In addition, this approach causes issues with functions that should be scoped to a specific project like “finding all references” or “Renaming” (see also this issue)

Other parties like the cdt-lsp team also have encountered similar problems.

We worked on a proof of concept implementation that solves the issues mentioned above by extending vscode-clangd to support multiple clangd servers within the same workspace: For each project we start a dedicated clangd server and map all files that can be matched to a specific project to its respective clangd server instance. This ensures that files within a project are correctly scoped within the boundaries of a single project, resolving all issues mentioned above. In addition, we track the currently active project. Tracking the currently active project (or rather its clangd server) ensures that extensions that want to communicate with the clangd server without a specific file context (e.g. the memory usage view) still work as expected. The active project can either be set manually by the user, or can be tracked automatically based on the last active file editor.

This multi-project feature is completely opt–in and can be activated via VS Code preferences. The default project resolution strategy is to map each workspace folder to one project. An extension mechanism is provided to support more complex project resolution strategies.

While this approach solves the issues mentioned above it also comes with additional challenges in regards to resource management. To mitigate the resource consumptionclangd server instances are spun up dynamically and disposed once they are no longer needed (i.e. there is no longer an open editor that needs that specific clangd server instance). If a lot of code is shared between projects, though, the overall memory footprint might become larger as each server maintains its own index of these shared source files. But this is a compromise we believe is worth taking in multi-project scenarios. Moreover, there are options to mitigate this issue e.g. by using external indexes for shared library code.

For our usecases this solution works really well and offers a more complete and correct multi-project support than what can currently be achieved with vscode-clangd.

In our opinion this is a feature that could be interesting for other parties of the clangd community as well. We are happy to contribute our implementation back to the vscode-clangd project. Therefore, we would like to get your feedback whether you’d be interested in this feature! Due to the opt-in approach it could be integrated into the main vscode-clangd project without affecting the current behavior.

Please let us know what you think! Your perspective on having this feature integrated in the main vscode-clangd project or any other thoughts on this approach is highly appreciated!

bencefr commented 1 year ago

Great description @tortmayr , I was just about to write up something similar...

My team is also working on an extension that manages multi-root workspaces where multiple projects exists at the same time, all of them likely with multiple build folders. We track the active build ourselves, so we know which compile_commands.json database should be active at any given time. It would be greatly beneficial if by a single command we could tell the clangd extension to change context.

However... it's not always the case that a compilation database helps, as that is generated after the fact, not while developing. When a new file is added, when code moves around, it is just silly to start a build to regenerate the compilation database. How we interact with Microsoft's C/C++ extension is via providers where we translate the compilation database to their expected configuration objects for any given file (even if it's not found in the database just yet) with fallbacks. I believe this is a bit more complex but rather flexible alternative solution.

planger commented 1 year ago

Thanks for the feedback so far! I'm very happy to see that quite a few people +1'd this proposal. So I would like to follow-up and ask about your thoughts and whether you'd be supporting / accepting a contribution as outlined above @sam-mccall @hokein?

Please let us know, we'd be happy to prepare a PR! Thank you in advance!

planger commented 1 year ago

@sam-mccall @hokein Do you have any feedback on our proposal above? We'd like to get a better picture on whether this has a chance to be contributed to the main branch (our preferred option) or whether we have to maintain a fork. Thank you in advance!

PS: If there is any other feedback or thought, please feel free to share them with us!

planger commented 1 year ago

@HighCommander4 Do you have feedback or suggestions on how we should proceed with our proposal above? Thank you very much in advance!

HighCommander4 commented 1 year ago

@HighCommander4 Do you have feedback or suggestions on how we should proceed with our proposal above?

Here are my thoughts:

In cases where a set of projects are related (e.g. share common code or dependencies), having a single clangd instance for this set of projects seems like the better conceptual model to me. For example, it seems valuable to be able to rename a function in a shared dependency and have its uses in multiple dependent projects be updated.
- I think we should try harder in clangd to make these use cases work better with a single instance, e.g. by fixing issues such as https://github.com/clangd/clangd/issues/1101.
I can also imagine use cases (e.g. unrelated projects that share a workspace for reasons of editor ergonomics) where separate instances per projects is the better conceptual model.
Having the ability to opt into instance-per-project seems useful, both for use cases of the second kind, and use cases of the first kind until clangd gets better at handling those in a single instance.
I don't have a strong opinion on whether the instance-per-project support should live upstream in vscode-clangd, or in a fork.
- On the one hand, given the small size and slow rate of change of vscode-clangd, presumably maintaining a fork is not a lot of effort.
- On the other hand, having the support upstream has clear advantages, e.g. discoverability.
- I personally have no opposition to upstreaming this support.
- However, I'm also not the decision-maker for this. We need one of the project owners, @sam-mccall / @kadircet / @hokein, to weigh in.

planger commented 1 year ago

@HighCommander4 Thank you very much for sharing your thoughts! Very much appreciated!

On the question regarding upstreaming or forking: I can live with both approaches, but have a strong preference for upstreaming. Not only because of maintenance and visibility, which are obviously important factors, but also -- unless there is a good technical reason (e.g. because it couldn't be made opt-in which fortunately isn't the case here) -- because it has the effect of splitting the communities. Over time it may become unclear to community members where certain topics shall be discussed, certain issues shall be fixed, new features shall be integrated, etc. So just as a general guideline for me, if there is no clear reason for forking, I'm definitively in favor of joining communities and combining forces.

But of course this is up to the project owners! So I'm very much looking forward to their opinions!

Thank you very much again for your great feedback and thanks to the project owners in advance for considering!

planger commented 1 year ago

@sam-mccall / @kadircet / @hokein Do you have any feedback to the topic above? Thank you!

MNASTM commented 6 months ago

Any progress on this? Multi projects with duplicate symbols is quite a common use case on our side and we will be interested as well by a way to solve these 'go to' issues. @planger did you end-up with a fork? Thanks!

planger commented 6 months ago

@MNASTM Yes (unfortunately), we created a fork under the Eclipse CDT Cloud umbrella, with the multi-project support and an own publishing pipeline.

Repository: https://github.com/eclipse-cdt-cloud/vscode-clangd OpenVSX: https://open-vsx.org/extension/eclipse-cdt/vscode-clangd-cdtcloud VS Code Marketplace: https://marketplace.visualstudio.com/items?itemName=eclipse-cdt.vscode-clangd-cdtcloud

Of course our goal is to upstream everything that makes sense, or even all, if that's welcomed by the maintainers of this repo at some point.

We are looking forward to any feedback! Feel free to open issues.

geertj commented 2 months ago

Adding my two cents. In our development process, workspaces always contain related projects, typically dependencies. Therefore the "single clangd per workspace" is probably the better option for us. There are still issues with this approach as well. For example, it would be nice if there was a way to have a per-project configuration where to find compile_commands.json, and then somehow aggregate those and pass them to the single per-workspace clangd instance.

clangd / vscode-clangd

Improve multi-project support in a single workspace #498