Open andocz opened 2 years ago
It turns out there's a bug I didn't catch: if the build directory doesn't exist, the parsing fails, because RealFileSystem
requires that the chdir
target exists. To fix this, I'll have to reimplement RealFileSystem
without this check.
It turns out there's a bug I didn't catch: if the build directory doesn't exist, the parsing fails, because
RealFileSystem
requires that thechdir
target exists. To fix this, I'll have to reimplementRealFileSystem
without this check.
I do not understand this. How is it possible to have a build generated with CMake (which creates directory structures) or logged (in which case the build was executed and should have created the directories) but do not have the "build directory" (what does this refer to?) exist?
Is it meaningful to parse a file in such a context? If the directory where the build was executed (is this what "build directory" refers to?) doesn't even exist, how was the build executed? We'll end up having issues with (potentially generated?) files gone missing...
In general, I would prefer if we tried not to reimplement things from LLVM too much, unless really needed. We're way behind the upstream and all these reimplements will cause problems when we try supporting newer releases of Ubuntu even.
Perhaps we could either create the directory, or give a hard error and have the user reconsider what they messed up with their build system...
Okay, I tried giving a second read to the patch itself but I am still confused. What do we need the chdir
for, and why do we try chdir
ing to that directory? Or is that chdir
needed by Clang instead? What about VirtualFileSystem
? Can't we create a VirtualFileSystem
that always says "Yes, sir, this directory surely exists!" in case it doesn't really exist on the real device, and overlay that on top of the RFS in the VFS stack the ClangTool
uses and then make clang
believe that directory is there?
How is it possible to have a build generated with CMake (which creates directory structures) or logged (in which case the build was executed and should have created the directories) but do not have the "build directory" (what does this refer to?) exist?
It can happen if the build directory is selectively transferred between computers, like what happens on the CI where we zip up just the source files. The build directory (where CMake was invoked) exists here actually, but the compilation directory ("directory"
in compile_commands.json
) of some commands doesn't, since directories containing no source files go missing.
We'll end up having issues with (potentially generated?) files gone missing...
We can have the compilation dependencies all intact with only compilation directories containing none missing, which is what happened on the CI.
In general, I would prefer if we tried not to reimplement things from LLVM too much, unless really needed.
Oh, I didn't mean reimplementing it from scratch, just creating a wrapper FileSystem
subclass that adjust arguments before passing them on to the methods of a RealFileSystem
instance, converting relative paths to absolute ones. It would work like ProxyFileSystem
.
Or is that
chdir
needed by Clang instead?
Yes, it chdir
s to the directory of each compilation command before running the tool here. This is what will make future references to relative paths get resolved correctly.
Can't we create a
VirtualFileSystem
that always says [...]
I don't think that would work. RealFileSystem
uses llvm::sys::fs::is_directory
to verify that the work dir exists; the ClangTool
's stack isn't used at this level.
But even if it worked, I think my "redirecting wrapper class" idea is more straightforward.
Oh, there is ProxyFileSystem
now... neat. Back when I implemented reparse, that wasn't a thing yet. The setCWD would've worked though, because that's also what reparse uses:
And in this case, almost none (apart from Clang's own intrinsic headers that install with CodeCompass...) of the directories exist on the server. And the overlay is passed to ClangTool
:
The reason missing directories never caused an issue with reparse is because it doesn't set the working directory properly. It's using the default value of .
from FixedCompilationDatabase::loadFromCommandLine
. It's a good thing you mentioned reparse, because I realized it's still broken with relative paths.
DatabaseFileSystem
doesn't actually use its _currentWorkingDirectory
for anything. It passes the paths it receives as-is to getFile
. Also, setCurrentWorkingDirectory
doesn't work properly if path_
is absolute; the path_
gets appended to the work dir instead of being used in itself. These will have to be corrected.
The reparser properly supports relative paths now. I had to add a new table to the database (BuildDirectory
) for this.
I decided to scrap the FakeWorkDirFileSystem
, because to make it work properly, it would need to resolve ..
s in paths lexically, which is too messy for my liking. (I can think of symlinks as a case when this would produce wrong results.) Now if the build directory is missing, a warning is emitted and /
is used as a fallback.
For parsing to fail, a build command needs to have its directory missing and use a relative path. I don't think it's worth going out of our way to recover from this scenario given how unlikely it is. You suggested creating the build directory, but I think the parser shouldn't modify its inputs.
The PR should also handle relative paths for the file
in the compilation command source.
As part of #637, it was discover that in case the directory
is /source/
, the file
is file.cpp
and the execution directory of CodeCompass_parser
is /cc/
, then instead of /source/file.cpp
, the non-existent file /cc/file.cpp
is being parse, leading to error.
This PR makes Clang tool invocations use the working directories of each compilation command. This fixes relative paths in arguments not being handled properly (#571), as previously the tool ran in the default working directory of
.
.Callers of
SourceManager::getFile
incppparser
were changed to provide an absolute path.Additionally, inaccurate comments in
SourceManager::getFile
were corrected.