boostorg / wave

Boost.org wave module
http://boost.org/libs/wave
21 stars 49 forks source link

Implement __has_include #96

Closed jefftrull closed 4 years ago

jefftrull commented 4 years ago

C++17 introduces __has_include, which is useful for feature testing. It seems possible to implement it without too much trouble, and at least one Boost project listed its absence as a reason to not use Wave.

hkaiser commented 4 years ago

FWIW, this should be easy enough as the required underlying functionality is already available here: https://github.com/boostorg/wave/blob/develop/include/boost/wave/util/cpp_include_paths.hpp#L350

jefftrull commented 4 years ago

So, would you call __has_include a "function-like macro" or is it similar to __VA_OPT__ where because it is not usable in a general context (i.e. outside of #if or #elif) it doesn't count? I'm thinking the latter...

hkaiser commented 4 years ago

__has_include is special. It's both, a function like and an object like macro that can be used in #if expressions only (see for instance https://gcc.gnu.org/onlinedocs/gcc-10.1.0/cpp/_005f_005fhas_005finclude.html).

jefftrull commented 4 years ago

Oh boy :) So do we fire the expanding-macro hooks then?

hkaiser commented 4 years ago

@jefftrull This is tricky. From what I read in the Standard (http://eel.is/c++draft/cpp.cond#nt:has-include-expression), there are two forms of the function-like invocation:

_­_­has_­include ( header-name )
_­_­has_­include ( header-name-tokens )

it also says:

  1. The second form of has-include-expression is considered only if the first form does not match, in which case the preprocessing tokens are processed just as in normal text.
  2. The header or source file identified by the parenthesized preprocessing token sequence in each contained has-include-expression is searched for as if that preprocessing token sequence were the pp-tokens in a #include directive, except that no further macro expansion is performed. If such a directive would not satisfy the syntactic requirements of a #include directive, the program is ill-formed. The has-include-expression evaluates to 1 if the search for the source file succeeds, and to 0 if the search fails.

I would read 3. such that only the two (common) forms of valid #include statement syntaxes have to be supported here:

h-preprocessing-token:
    any preprocessing-token other than >
h-pp-tokens:
    h-preprocessing-token
    h-pp-tokens h-preprocessing-token
header-name-tokens:
    string-literal
    < h-pp-tokens >

It explicitly says that no further macro expansion is performed. There might be no room for invoking any of the hooks after all (we don't invoke any hooks for the defined() operator either).

For all of the above I'd love to hear other opinions, however.

jefftrull commented 4 years ago

I personally feel like it's most similar to defined() and should be treated the same way.

jefftrull commented 4 years ago

It looks like - if I read this correctly - we have to perform expansion, once, on the tokens inside the parentheses of __has_include, in a similar manner to the code here if the expression is not a plain quoted or angle-bracketed include expression - does that sound right?

hkaiser commented 4 years ago

It looks like - if I read this correctly - we have to perform expansion, once, on the tokens inside the parentheses of __has_include, in a similar manner to the code here if the expression is not a plain quoted or angle-bracketed include expression - does that sound right?

I'm not sure. That's what I was trying to discuss above:

I would read 3. such that only the two (common) forms of valid #include statement syntaxes have to be supported here:

h-preprocessing-token:
    any preprocessing-token other than >
h-pp-tokens:
    h-preprocessing-token
    h-pp-tokens h-preprocessing-token
header-name-tokens:
    string-literal
    < h-pp-tokens >

This sounds to me as if __has_include() supports only quoted strings and filenames in angle brackets. How does clang handle this?

jefftrull commented 4 years ago

I read this a bit differently - I think the first case (header-name) contains the normal double quoted or angle bracketed options, while the second part (header-name-tokens) is either a string literal or < h-pp-tokens >, which can be just about anything enclosed in angle brackets that can evaluate to a valid filename - I assume that's what the code here is doing for #include. In other words, a "computed include" is allowed here, which is why we may have to perform macro expansion...?

I'll try to create an example.

hkaiser commented 4 years ago

Any #include statement can have three forms: #include "string-literal" new-line, #include <h-pp-tokens> new-line, or #include pp-tokens new-line. The decision is made based on whether it's something surrounded by "", or by <>. The third option is a fallback if neither of the first apply (see http://eel.is/c++draft/cpp.include). Only for the third option the pp-tokens following the #include are macro-expanded.

The specification for __has_include( something ) says that something can be either a string-literal (i.e. something surrounded by "" or a sequence of pp-tokens surrounded by <> where > is not allowed in between the <>:

has-include-expression:
    _­_­has_­include ( header-name )
    _­_­has_­include ( header-name-tokens )

which is confusing to me, as both, the header-name (http://eel.is/c++draft/lex.header#nt:header-name) and the header-name-tokens (http://eel.is/c++draft/cpp.cond#nt:header-name-tokens) are defined the same.

In any case, I think that none of the definitions above even hint at the third option (namely some sequence of pp-tokens that needs to be macro preprocessed). This is also supported by (see above, highlighted by me):

The header or source file identified by the parenthesized preprocessing token sequence in each contained has-include-expression is searched for as if that preprocessing token sequence were the pp-tokens in a #include directive, except that no further macro expansion is performed.

However, I admit that all of this is not conclusive (as usual when you start reading the standard) and I might be wrong. Let's find out what clang and gcc do and go for the same.

jefftrull commented 4 years ago

Here's an example that works in both gcc and clang:

#define FOO(X, Y) <X ## Y>

#if __has_include( FOO(io,stream) )
char const * result = "do";
#else
char const * result = "do not";
#endif

#include FOO(io,stream)

int main()
{
    std::cout << "we " << result << " have iostream\n";
}
jefftrull commented 4 years ago

As for the Standard, my reading is that

_­_­has_­include ( header-name ) incorporates the two standard forms (quotes and brackets) of file inclusion, while

_­_­has_­include ( header-name-tokens ) gives an additional form usable for "computed includes".

jefftrull commented 4 years ago

I've found an interesting inconsistency between gcc and clang:

#define BAR iostream
#define FOO <BAR>

// clang says <BAR> is not found
// gcc says it is
#if __has_include( <BAR> )
char const * result = "do";
#else
char const * result = "do not";
#endif

// both clang and gcc are happy with this:
#include FOO

int main()
{
    std::cout << "we " << result << " have iostream\n";
}
jefftrull commented 4 years ago

Actually it looks like Clang is correct. As the standard says:

The second form of has-include-expression is considered only if the first form does not match.

So because <BAR> looks like a regular include filename, it searches for it and does not try to expand the macro.

Here's a really interesting gcc bugzilla thread for the problem (it was fixed in 10.1, which I don't have on my system).

jefftrull commented 4 years ago

This line (from Bugzilla) seems particularly relevant:

The proper fix, IMO, would be to apply the same special tokenization rules to the argument of __has_include that are used for the argument of #include. This is not exactly the same as "don't macro-expand the argument."

jefftrull commented 4 years ago

Here's an interesting puzzle: The gcc manual says that includes get special treatment in quoting, e.g. #include "x\n\\y" specifies a filename containing three backslashes. Currently AFAICT this is handled at the lexer level by using different tokens (QHEADER and HHEADER). Would we need to introduce a new token for __has_include ?

hkaiser commented 4 years ago

Here's an interesting puzzle: The gcc manual says that includes get special treatment in quoting, e.g. #include "x\n\y" specifies a filename containing three backslashes. Currently AFAICT this is handled at the lexer level by using different tokens (QHEADER and HHEADER). Would we need to introduce a new token for __has_include ?

Whatever we do, we should have consistent behavior for #include and __has_include.