Quuxplusone / LLVMBugzillaTest

0 stars 0 forks source link

Unhelpful error and inconsistent behavior with "clang -E -o" and "clang -S -o" when compiling CUDA code #30014

Open Quuxplusone opened 7 years ago

Quuxplusone commented 7 years ago
Bugzilla Link PR31041
Status NEW
Importance P normal
Reported by Justin Lebar (:jlebar) (justin.lebar@gmail.com)
Reported on 2016-11-16 19:55:52 -0800
Last modified on 2016-11-17 13:11:57 -0800
Version unspecified
Hardware PC Linux
CC hfinkel@anl.gov, llvm-bugs@lists.llvm.org, sfantao@us.ibm.com, tra@google.com
Fixed by commit(s)
Attachments
Blocks
Blocked by
See also
Without -o, -E does host compilation and prints to stdout:

$ echo 'XXX __CUDA_ARCH__' | llvm-run clang++ -E -x cuda - | grep XXX
XXX __CUDA_ARCH__

(That the __CUDA_ARCH__ macro is not defined indicates that the preprocessed
source is host code, rather than device code.)

But with -o, we raise a confusing error:
$ echo | llvm-run clang++ -E -x cuda - -o -
clang-4.0: error: cannot specify -o when generating multiple output files

The same thing happens with -S, except that -S outputs *device* assembly:

$ echo | llvm-run clang++ -S -x cuda -
$ cat -- '--cuda-nvptx64-nvidia-cuda-sm_20.s'
//
// Generated by LLVM NVPTX Back-End
//
[...]

With -o, we raise the same confusing error:
$ echo | llvm-run clang++ -E -x cuda - -o -
clang-4.0: error: cannot specify -o when generating multiple output files

I can see two ways to rationalize our behavior:

1) Require --cuda-device-only or --cuda-host-only with -E and -S (with and
without -o), and improve the error message to mention these flags.

2) Default -E and -S to either host or device and make things work the same
with -o.

I kind of lean towards (1), but maybe that would break people?
Quuxplusone commented 7 years ago
I think that depends on the behavior you want for CUDA. If you do:

$ echo | llvm-run clang++ -E -x cuda - -o -

a user would expect both host and device code to be generated, I think. However
in CUDA that only happens when you reach the injection phase. Based on that,
and to keep the same behavior, I think one should do 1), i.e. emit a better
diagnostic and suggest to the user the options --cuda-host/device-only.

For OpenMP, this is not much of a problem given that human readable files are
bundled, and can be used seamlessly in separate compilation. You can leverage
that feature for CUDA too if that is interesting to have.

Thanks,
Samuel
Quuxplusone commented 7 years ago
Compiler *does* produce multiple outputs for -E which you can verify with -###
or observing preprocessor output itself.

You've correctly inferred that host side preprocessing happened, but device
side happend as well. The reason you don't see second XXX is that you've given
the pipe as an input and everything you've echoed got consumed by host
compilation. That will cause troubles if input can't be consumed more than once.

The reasoning behind current behavior:
Explicitly specified -o FOO implies that the output will be stored in file
named exactly FOO. In case of cuda that does more than one compilation under
the hood, -o may be ambiguous, depending on when in the pipeline you specify
it. I.e. -o FFF for the final object is OK. -o FOO for assembler or
preprocessor is not because you will get different output on host and device
side and one would clobber another.

If -o is not specified, driver is free to generate whatever name it wants and
thus we're not constrained by one-explicity-named-output.
Quuxplusone commented 7 years ago

The reason you don't see second XXX is that you've given the pipe as an input and everything you've echoed got consumed by host compilation.

Ah, okay. I've verified this is right.

If -o is not specified, driver is free to generate whatever name it wants and thus we're not constrained by one-explicity-named-output.

Okay, and with -S without -o, we do generate both host and device assembly files. I was seeing only one for the same reason as we were only getting one with -E -- the fact that I was piping the input.

I would still like to improve the error message here, because I've now had users ask me about this on two separate occasions. But I guess the behavior makes sense.