clice-project / clice

MIT License
24 stars 1 forks source link

Improve performance and reduce memory usage #5

Open 16bit-ykiko opened 1 week ago

16bit-ykiko commented 1 week ago

Issues in clangd:

Building a preamble for large header files usually takes a lot of time, which is expected behavior. The more lines a file has, the longer the compilation time. We shouldn't expect a file with hundreds of thousands of lines to compile as quickly as one with only a few hundred. One possible optimization approach is to improve the clang frontend; however, this might be a task where the effort and return are completely mismatched, and the optimization effects could be quite limited. Our main goal is to examine the current issues in clangd and see if there are areas for further improvement.

16bit-ykiko commented 1 week ago

One significant issue is that clangd does not persistently store PCH on disk. Even with the -pch-storage=disk option passed, clangd only uses the disk to store PCH temporarily, and when clangd shuts down, all PCH caches are deleted. If we can implement persistent storage, it could significantly improve the loading time when reopening files.

This is possible because the PCH generated by the Clang driver is persistently stored, we just need to mimic how it is done. More specifically, clangd use clang::PrecompiledPreamble::Build to build preamble, and It is the main culprit behind preventing persistent PCH storage. We should use CompilerInstance to execute clang::GeneratePCHAction for generating persistent PCH on disk and set PreprocessorOpt::ImplicitPCHInclude for reuse it.

A fundamental issue is how we determine whether the PCH can be reused. Imagine the following scenario: you open the editor, and the PCH is cached on disk. Then you close the editor, use Git to update the source code, and reopen the editor. Can this PCH be reused? Clearly, we need to record some additional information to check whether the PCH needs to be regenerated. The work here is very similar to what a build system does—we need to track the source files the PCH depends on and the timestamps when these files were saved. If none of the dependent files have changed, the PCH can be reused; otherwise, it cannot. It is also worth mentioning that for PCM files generated by C++20 module, the situation is entirely analogous.

16bit-ykiko commented 1 week ago

Note that we will only build persistent PCH for opened file rather than all files in the compile_commands.json when index. Since PCH files take up a lot of disk space, if each file generates a PCH, a large amount of space will be consumed. So for first opening, how can we improve user experience when wait for preamble building? The answer is more powerful index format.

Generally, we will record more information in the index file, so that we can implement readonly LSP features through them without AST building.