facebook / buck2

Build system, successor to Buck
https://buck2.build/
Apache License 2.0
3.33k stars 194 forks source link

Fix show_headers dep file processing not showing errors. #625

Closed zjturner closed 2 weeks ago

zjturner commented 3 weeks ago

When using the show_headers dep file parsing mode (generally used with clang), a script show_headers_to_dep_file.py is invoked behind the scenes in order to convert the output of clang's --show-headers argument into a dep file. When the invocation of clang --show-headers fails for any reason, the script completely swallows the output and it does not propagate up to the user, which results in the user having no feedback about the cause of the error. Moreover, since this action runs before the action which runs the true compilation, a failure on this action terminates the build with no feedback, and no other subsequent steps run to fill in the missing pieces, so the user is just left with no information.

This is an extremely common use case, since something as simple as forgetting to list a header in the headers or exported_headers attribute of a cxx_library will trigger it.

The root cause of this bug is that the script would only parse the output of the compiler if the returncode indicated that the process succeeded. Ironically, the function to parse the output already handled the case of printing error lines back to stdout, it just wasn't called when there was an error.

Here is some sample output of buck2 running an action remotely with and without this PR:

with PR:

$ buck2 build --target-platforms=internal//platforms:linux-x64 //Client/Core:Core[objects][src/Bitset.cpp.o]
Action failed: root//Client/Core:Core (cxx_compile src/Bitset.cpp)
Remote command returned non-zero exit code 1
stdout:
stderr:
In file included from Client/Core/src/Bitset.cpp:1:
In file included from Client/Core/include/Core/Bitset.h:3:
Client/Core/include/Core/Bit.h:81:10: fatal error: 'Bit.inl' file not found
#include "Bit.inl"
         ^~~~~~~~~
.. /usr/lib/llvm-12/bin/../include/c++/v1/string
1 error generated.
Build ID: a5eaf819-1a66-419d-8688-f602a39a126c
Network: (GRPC-SESSION-ID)
Jobs completed: 5. Time elapsed: 2.6s.
Cache hits: 0%. Commands: 1 (cached: 0, remote: 1, local: 0)
BUILD FAILED
Failed to build 'root//Client/Core:Core (internal//platforms:linux-x64#48291db568f0aab5)'

without PR:

$ buck2 build --target-platforms=internal//platforms:linux-x64 //Client/Core:Core[objects][src/Bitset.cpp.o]
Action failed: root//Client/Core:Core (cxx_compile src/Bitset.cpp)
Remote command returned non-zero exit code 1
stdout:
stderr:
Build ID: a5eaf819-1a66-419d-8688-f602a39a126c
Network: (GRPC-SESSION-ID)
Jobs completed: 5. Time elapsed: 2.6s.
Cache hits: 0%. Commands: 1 (cached: 0, remote: 1, local: 0)
BUILD FAILED
Failed to build 'root//Client/Core:Core (internal//platforms:linux-x64#48291db568f0aab5)'
facebook-github-bot commented 2 weeks ago

@JakobDegen has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

lmvasquezg commented 2 weeks ago

Thanks for the fix! Could we do this in a way that we're not applying the dep files parsing to the error itself? Something like if line.startswith(".") and returncode ==0. Other wise if the error has a line that starts with "." it will be wrongly altered.

zjturner commented 2 weeks ago

Thanks for the fix! Could we do this in a way that we're not applying the dep files parsing to the error itself? Something like if line.startswith(".") and returncode ==0. Other wise if the error has a line that starts with "." it will be wrongly altered.

Do you have a suggestion for how i can test this, even locally? I can certainly apply your suggested fix, but it would be best if i can actually try it

zjturner commented 2 weeks ago

Actually I don't think it matters? Because the dep file itself isn't being written unless the returncode is 0. So the way the code is written today, it's possible that the content to be written to the depfile will contain junk. But then this:

-    dep_file_utils.normalize_and_write_deps(deps, dst_path)
+    if returncode == 0:
+        dep_file_utils.normalize_and_write_deps(deps, dst_path)

will cause that junk to just get discarded and not written to any depfile.

lmvasquezg commented 2 weeks ago

The fact that you can run your cxx binary without any issues is already a good sign, I'd recommend making sure depfiles are being generated by using buck2 audit depfiles (bear in mind this is sensitive to configurations).

My comment is not about moving the if returncode == 0:, is about adding an one to avoid error line being formatted, it's text processing that's unnecessary in this case and just a call for string processing errors that would be hard to debug, let's avoid that.

zjturner commented 2 weeks ago

Actually I still think this won't work. If I change to this:

        if line.startswith(".") and returncode == 0:
            path = remove_leading_dots(line.replace(" ", ""))
            if len(path) > 0:
                deps.append(path.strip())
                continue

then the continue will never be executed, and we will just print the entire content of --show-headers output to stderr, rather than just the error message.

No matter how I restructure this code, there is not information in the output to distinguish between error lines and header output.

lmvasquezg commented 2 weeks ago

I'm not a clang expert but I'd say it's rather unlikely the dep file will be generated if compilation failed, therefore, adding if returncode ==0 will just guarantee compilation succeeded before trying to parse the depfile. If compilation didn't succeed, it's simply print out all of the error to stderr which is what we want. If you're talking about simply avoiding printing the half-processes depfile, I'd suggest you collect the output, diff it out with the deps pattern and only print the error (assuming you have an example where this happens).

zjturner commented 2 weeks ago

I think I'm not following. This code is not parsing a depfile. It's parsing output directly from the compiler that is used to generate the depfile. Here's a concrete example:

zturner@ubuntu:~$ cat foo.cpp
#include <stdio.h>
#include "bar.h"
zturner@ubuntu:~$ cat bar.h
#include "baz.h"
zturner@ubuntu:~$ clang++ -H -c foo.cpp
. /usr/include/stdio.h
.. /usr/include/x86_64-linux-gnu/bits/libc-header-start.h
... /usr/include/features.h
.... /usr/include/stdc-predef.h
.... /usr/include/x86_64-linux-gnu/sys/cdefs.h
..... /usr/include/x86_64-linux-gnu/bits/wordsize.h
..... /usr/include/x86_64-linux-gnu/bits/long-double.h
.... /usr/include/x86_64-linux-gnu/gnu/stubs.h
..... /usr/include/x86_64-linux-gnu/gnu/stubs-64.h
.. /usr/lib/llvm-10/lib/clang/10.0.0/include/stddef.h
.. /usr/lib/llvm-10/lib/clang/10.0.0/include/stdarg.h
.. /usr/include/x86_64-linux-gnu/bits/types.h
... /usr/include/x86_64-linux-gnu/bits/wordsize.h
... /usr/include/x86_64-linux-gnu/bits/timesize.h
... /usr/include/x86_64-linux-gnu/bits/typesizes.h
... /usr/include/x86_64-linux-gnu/bits/time64.h
.. /usr/include/x86_64-linux-gnu/bits/types/__fpos_t.h
... /usr/include/x86_64-linux-gnu/bits/types/__mbstate_t.h
.. /usr/include/x86_64-linux-gnu/bits/types/__fpos64_t.h
.. /usr/include/x86_64-linux-gnu/bits/types/__FILE.h
.. /usr/include/x86_64-linux-gnu/bits/types/FILE.h
.. /usr/include/x86_64-linux-gnu/bits/types/struct_FILE.h
.. /usr/include/x86_64-linux-gnu/bits/types/cookie_io_functions_t.h
.. /usr/include/x86_64-linux-gnu/bits/stdio_lim.h
.. /usr/include/x86_64-linux-gnu/bits/sys_errlist.h
. ./bar.h
In file included from foo.cpp:2:
./bar.h:1:10: fatal error: 'baz.h' file not found
#include "baz.h"
         ^~~~~~~
1 error generated.

All of this output is written to stderr, it doesn't write some to stdout and some to stderr. However, this code needs to extract the error message from the output and only display the error lines, it doesn't want to display the full output because it can be many screens long. All the user wants to see is this:

Remote command returned non-zero exit code 1
stdout:
stderr:
In file included from foo.cpp:2:
./bar.h:1:10: fatal error: 'baz.h' file not found
#include "baz.h"
         ^~~~~~~
1 error generated.
Build ID: a5eaf819-1a66-419d-8688-f602a39a126c
Network: (GRPC-SESSION-ID)
Jobs completed: 5. Time elapsed: 2.6s.
Cache hits: 0%. Commands: 1 (cached: 0, remote: 1, local: 0)
BUILD FAILED

and not this:

. /usr/include/stdio.h
.. /usr/include/x86_64-linux-gnu/bits/libc-header-start.h
... /usr/include/features.h
.... /usr/include/stdc-predef.h
.... /usr/include/x86_64-linux-gnu/sys/cdefs.h
..... /usr/include/x86_64-linux-gnu/bits/wordsize.h
..... /usr/include/x86_64-linux-gnu/bits/long-double.h
.... /usr/include/x86_64-linux-gnu/gnu/stubs.h
..... /usr/include/x86_64-linux-gnu/gnu/stubs-64.h
.. /usr/lib/llvm-10/lib/clang/10.0.0/include/stddef.h
.. /usr/lib/llvm-10/lib/clang/10.0.0/include/stdarg.h
.. /usr/include/x86_64-linux-gnu/bits/types.h
... /usr/include/x86_64-linux-gnu/bits/wordsize.h
... /usr/include/x86_64-linux-gnu/bits/timesize.h
... /usr/include/x86_64-linux-gnu/bits/typesizes.h
... /usr/include/x86_64-linux-gnu/bits/time64.h
.. /usr/include/x86_64-linux-gnu/bits/types/__fpos_t.h
... /usr/include/x86_64-linux-gnu/bits/types/__mbstate_t.h
.. /usr/include/x86_64-linux-gnu/bits/types/__fpos64_t.h
.. /usr/include/x86_64-linux-gnu/bits/types/__FILE.h
.. /usr/include/x86_64-linux-gnu/bits/types/FILE.h
.. /usr/include/x86_64-linux-gnu/bits/types/struct_FILE.h
.. /usr/include/x86_64-linux-gnu/bits/types/cookie_io_functions_t.h
.. /usr/include/x86_64-linux-gnu/bits/stdio_lim.h
.. /usr/include/x86_64-linux-gnu/bits/sys_errlist.h
. ./bar.h
In file included from foo.cpp:2:
./bar.h:1:10: fatal error: 'baz.h' file not found
#include "baz.h"
         ^~~~~~~
1 error generated.

However, in creating this toy example I did notice that one of the error lines does start with a ., and the code as I wrote it will not print that line. So instead, I think I should change it to take the first line that does not start with a dot, and then print everything after that.

lmvasquezg commented 2 weeks ago

Sound like a good plan, thanks for explaining! By depfile I meant the general output of repeated ./path/to/dep, when I implemented this there wasn't many cases internally where we'd get a half parsed output, so glad we're covering this now :)

zjturner commented 2 weeks ago

So even my above idea didn't work, because I found some cases where include lines would be printed after the error message. the best heuristic I can find is that any line which doesn't begin with 1 or more dots followed by a space is part of the error message and should be printed.

This doesn't have to be perfect because it's just printing an error message, and this is 99.9% correct (maybe 100%, depending on implementation details of clang).

I did find one quirk, which is that clang sometimes prints some of its error diagnostics in color and it was messing up the parsing because there would be ANSI color code sequences at the beginning of the string. To prevent this, I passed in -fno-color-diagnostics when constructing the cmd_args for the clang -H invocation.

Also, this entire branch is now only taken when the returncode is non-zero, so in the case that the command was successful, there's no change in behavior from before, so this only makes the error case better while leaving the success case the same.

It's probably possible to invert the logic and only try to detect error messages (rather than detecting success lines). To do that you'd probably scan for "In file included" and "N error generated." and take everything in between. But that's even more fickle and more complicated, and I think the way it is now is pretty solid.

facebook-github-bot commented 2 weeks ago

@JakobDegen has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.