Ccache partial support for `-Xcompiler`

TL;DR: Ccache doesn't parse most flags passed to -Xcompiler. Code coverage files aren't regenerated without a workaround.

Background

ESPResSo recently introduced support for code coverage of CUDA source files (e11f6fdb67e2d). When compiling a file with name feature.cu, an extra feature.gcno profile notes file is meant to be generated alongside the target feature.cu.o object file. When executing the code, a feature.gcda count data file is automatically written to disk. The execution counts are mapped to the corresponding source code line via the feature.gcno file. Thus, both the gcno and gcda files are required to generate the code coverage report.

Ccache is a compiler cache: it parses the compiler command line options and stores the output of the compiler in a database. When the same compiler invocation is detected, and the input source file(s) didn't change, the stashed output files are fetched from the database and the compilation is avoided. Ccache detects derived output files like gcno files in a best-effort manner. This is achieved by function process_option_arg(), which relies on conditionals to handle flags like "-fprofile-arcs" and loops to handle tokens split from comma-separated lists via util::Tokenizer().

Problem statement

When compiling CUDA code with code coverage information using nvcc, we need to forward the coverage flags to the host compiler via the syntax -Xcompiler=-Og -Xcompiler=--coverage -Xcompiler=-fprofile-abs-path, or via the comma-separated syntax -Xcompiler=-Og,--coverage,-fprofile-abs-path (CUDA user guide: 4.2.4. Options for Passing Specific Phase Options).

Ccache currently only support a small subset of nvcc compiler flags (https://github.com/ccache/ccache/issues/1325), and coverage flags aren't part of them. Adding support for host compiler coverage flags isn't straightforward. In the two syntaxes outlined above, the former is relatively easy to implement, but the latter requires parsing a list of tokens to extract the relevant information. The Ccache parser doesn't use recursion, and thus cannot properly handle a comma-separated list of flags. While it has a tokenizer, it is used to detect a single token from the list. Recursion is actually required here, because the parser is meant to return an error code and a borrowed copy of the compiler flag for each flag in the command line. Using a for loop would prevent us from generating actionable diagnostics when a token list contains a mix of supported and unsupported flags.

Introducing recursion would probably require altering the data structure used to collect flags, considering the data structure currently requires the entire flag to be added to the structure, and we cannot simply add a token extracted from the token list without breaking the parser, as I found out while attempting to implement a non-recursive solution. The diff is reproduced below for completeness. With it, the gcno file is cached with compiler flag -Xcompiler=-Og,--coverage,-fprofile-abs-path, but isn't cached with compiler flag -Xcompiler=--coverage,-fprofile-abs-path.

non-working patch (click to unroll)

```diff diff --git a/src/ccache/argprocessing.cpp b/src/ccache/argprocessing.cpp index ae74e522..275b8e64 100644 --- a/src/ccache/argprocessing.cpp +++ b/src/ccache/argprocessing.cpp @@ -851,6 +851,25 @@ process_option_arg(const Context& ctx, return Statistic::none; } + if (util::starts_with(arg, "-Xcompiler=")) { + for (const auto part : util::Tokenizer(&arg[11], ",")) { + if (part == "-fprofile-arcs") { + args_info.profile_arcs = true; + state.common_args.push_back(args[i]); + } + if (part == "-ftest-coverage") { + args_info.generating_coverage = true; + state.common_args.push_back(args[i]); + } + if (part == "--coverage" || part == "-coverage") { + args_info.profile_arcs = true; + args_info.generating_coverage = true; + state.common_args.push_back(args[i]); + } + return Statistic::none; + } + } + if (arg == "-fstack-usage") { args_info.generating_stackusage = true; state.common_args.push_back(args[i]); ```

While one could adapt the parser to detect -Xcompiler=--coverage and its alternative forms (-Xcompiler --coverage, --compiler-options --coverage, --compiler-options=--coverage) as a best effort and skip the comma-separated versions, that would go against the expectations of most CUDA developers and could lead to subtle build reproducibility issues.

Reproducing the bug

Minimal working example:

#include <cuda.h>
#include <stdio.h>

__global__ void cuda_hello() {
  printf("Hello World!\n");
}

int main() {
  cuda_hello<<<1,1>>>();
  cudaDeviceSynchronize();
  return 0;
}

Current behavior: using ccache either built from sources using commit https://github.com/ccache/ccache/commit/f03feea871099ee5d6276435b5cc56242d9b634a or built from release 4.9.1, nvcc release 12.0 (V12.0.140, build cuda_12.0.r12.0/compiler.32267302_0) and gcc 12.3.0-17ubuntu1 as host compiler:

$ ls
mwe.cu
$ nvcc -Xcompiler=-Og,--coverage,-fprofile-abs-path -c mwe.cu
$ ls
mwe.cu  mwe.gcno  mwe.o
$ rm -f mwe.gcno mwe.o
$ ccache nvcc -Xcompiler=-Og,--coverage,-fprofile-abs-path -c mwe.cu
$ ls
mwe.cu  mwe.gcno  mwe.o
$ rm -f mwe.gcno mwe.o
$ ccache nvcc -Xcompiler=-Og,--coverage,-fprofile-abs-path -c mwe.cu
$ ls
mwe.cu  mwe.o

Expected behavior: the mwe.gcno should have been restored from the cache.

Outlook

Adding support for -Xcompiler flags in Ccache will probably require a significant code contribution.

Workaround: trick Ccache intro parsing host compiler flags by invoking nvcc like this:

ccache nvcc --forward-unknown-to-host-compiler --coverage -fprofile-abs-path -c mwe.cu

espressomd / espresso