Open 16bit-ykiko opened 1 month ago
Overall, we want to achieve the following effect: assuming both b
and c
include a
and generate different ASTs for it, when jumping from file b
to file a
, b
is used as the context; when jumping from file c
to file a
, c
is used as the context.
Possible challenges:
The answer is yes, the only thing we need to do is computing preamble bounds ourselves. For example, assume we have following file:
#include <string>
#include <vector>
// ... a lot of code
#include <user.h> <=
// ... a lot of code
The target header file is inside user.h
. So we can build all the code before user.h
as the preamble and cut off all code after it to improve performance. In the end, we can use the AST to render LSP response for the header file. For code completion, things are similar. Build the same preamble, and configure at FrontendOpts::CodeCompletionAt
like
auto& completion = instance->getFrontendOpts().CodeCompletionAt;
completion.FileName = filepath;
completion.Line = line;
completion.Column = column;
The filepath is the header file path(for clangd, it is main file). Then it can work perfectly. And code completion will cut off the source file automatically, so we don't need to any other thing.
For every header file, unlike TU, server will use a StringMap
to track all its context, i.e, every context has an AST and record the currently active context. If server request some go-to requests like textDocument.declaration
and the destination location is in header file, it will try to switch current context of header file to the file where request is emited. In this way, we can achieve our goal.
Besides, providing extension requests: headerContext/current
, headerContext/all
and headerContext/switch
to allow users to proactively query and switch header contexts.
User can config extra header contexts for specific file.
llvm::ADT/SmallVector
is included by nearly every source file in llvm. And it is self contained file, so we would better to distinguish these files with others to reduce memory usage.And it's possible that a header file is included multiple times in one source file. For a header file with guard macro or #pragma once
, the second including will generate nothing. For others, e.g. TokenKinds.def
in llvm, are designed to be included multiple times. So it's necessary to support switching header context in same file. Luckily, clang will track the include location of token, which means that we can distinguish them easily.
For example:
// test.h
struct X;
// test2.h
#include "test.h"
// test.cpp
#include "test.h"
#include "test2.h"
We will get two CXXRecordDecl
in final AST and they both dump as test.h:1:1
. But we can use SourceManager::getIncludeLoc
to to track include stack. For first decl, the result is test.cpp:1:10
. For second, the result is test2.h:1:10
, call again, result is test.cpp:2:10
.
By the way, there is also a function called HeaderSearch::isFileMultipleIncludeGuarded
, which could be used to determine whether a header file has an guard macro or #pragma once
.
Issues in clangd:
What
#include
in C/C++ does is just simplify copying the included file contexts to its location. Only.c/.cpp
(Translation Unit, i.e, TU) files will participate in the final compilation process and occur incompile_commands.json
with corresponding command.As we all know, clangd is clang based, we need to run clang frontend for given source file to get AST or code completion. Then, we could response LSP requests. For cpp files, it's trivial. We just need to complie it as normal in clang driver. The only difference is we only generate AST no further step to generate LLVM IR.
But what about header files? How clangd deal with header files? clangd just regards a header file as a translation unit, and generate AST for it. Compilation commands are guessed from source file, e.g. based on file name match. The simple way works, but is totally incomplete!
Since a header file is only part of the source file, its AST is likely dependent on the preceding text and may have different ASTs in different translation units. For example:
It's obvious that
a.h
has different AST inb.cpp
andc.cpp
Currently, clangd can only get the AST inc.cpp
, i.e, treats it as a single TU. Another more extreme example is non self-contained header file(file cannot be complied individually). In above the file, though only one AST will be used, at least it can work. Consider following example:clangd will emit compilation error for
a.h
, because it cannot find the definition ofX
, which is defined in its header context--b.cpp
.This could be really frustrating. We should support check, lookup and switch context of header file!