boostorg / wave

Boost.org wave module
http://boost.org/libs/wave
21 stars 49 forks source link

Tricky line number tracking #94

Closed jefftrull closed 3 years ago

jefftrull commented 4 years ago

In a recent Twitter post, @ericniebler asks

what should be the token sequence emitted by the preprocessor for this program:

#define FOO(X) __LINE__ BAR
#define BAR(X) __LINE__ 
FOO(X)
(Y)

GCC says: "3 3" but clang and msvc say "3 4". Who's right?

Sadly, Wave reports "3 1".

jefftrull commented 4 years ago

Some good discussion on the GCC bug tracker here

hkaiser commented 4 years ago

Some investigation shows that wave uses the position information of the __LINE__ token where it was seen during parsing (see: https://github.com/boostorg/wave/blob/develop/include/boost/wave/util/cpp_macromap.hpp#L1613, curr_token represents the original __LINE__). This works well as long as __LINE__ itself is not used as part of the replacement list for a macro.

This has to be changed such that any __LINE__ token resolves to the position where the macro containing the __LINE__ is expanded from.

hkaiser commented 4 years ago

Also, http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1911.htm has a nicely convoluted test case:

  #line 500
  #define MAC(a,b) a,b,__LINE__

  #line 1000
 int j[] = { __LINE__, __\
LINE\
__,

  #line 2000
  #line __\
LINE\
__
  __LINE__,

  #line 3000
  #line \
\
\
__LINE__\
\
\
/**/
  __LINE__,

  MAC(__LINE__,__LINE__),

  MAC(
    __LINE__,
    __LINE__
  ),  __LINE__,

  M\
A\
C\
\
(\
\
__\
LINE\
__,\
__LINE__\
\
),
  __LINE__,

  M\
A\
C\
(
__\
LINE\
__,
__LINE__
),
  __LINE__,
  999 };

The new wording specifying the expected behavior for multi-line macros containing __LINE__ can be found here: http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2322.htm

jefftrull commented 4 years ago

Is it possible we have the same issue with __FILE__ and __INCLUDE_LEVEL__?

jefftrull commented 4 years ago

OK so I tried a simple testcase:

#define theline __LINE__

// padding

static int foo = theline ;

This actually produces 5, the correct answer. Is it possible the situation is more complex? Also (to answer my own question) we seem to do the right thing - at least for simple cases - with the other two macros.

hkaiser commented 4 years ago

Is it possible we have the same issue with FILE and __INCLUDE_LEVEL__?

Hmmm, could be - if they are used in macros that are defined in a different file from where the macro is used...

jefftrull commented 4 years ago

Looking at this some more... I think it must be a tricky thing with rescanning. I find that it's actually quite hard to make Wave do the wrong thing. Maybe this only happens because the first macro FOO evaluates to the name of a second, function-like macro. If BAR is not a function-like macro it does the right thing. For example:

#define FOO(X) __LINE__ BAR
#define BAR __LINE__ 
FOO(X)
  (Y)

produces

3 3
(Y)
jefftrull commented 4 years ago

Just spent some quality time with gdb comparing the behavior of the passing (BAR is object-like) vs failing (BAR is function-like) cases. What seems to be happening is that the context's idea of the current line number, main_pos.line, is briefly set to 1 by this line and restored to either 4 or 5 by this line. This is true in both cases.

The big difference between the cases is that in the object-like macro case the line number is sampled (that is, __LINE__ is expanded) before the line is changed, and in the function-like macro it is sampled in the brief period when it is 1.

Does this suggest anything?

hkaiser commented 4 years ago

Does this suggest anything?

Not right away :/

The whole macro replacement process is intrinsically recursive, while act_token and act_pos are relatively global and 'on top' of the recursion. I wouldn't be surprised if the code handled those incorrectly .

jefftrull commented 4 years ago

I've reduced the failure a little more. This also produces the wrong answer:

#define FOO something BAR
#define BAR(X) __LINE__ 
#pragma wave trace(enable)
FOO(Y)
#pragma wave trace(disable)

The critical features appear to be:

  1. BAR is a function-type macro
  2. There is a token preceding BAR in the definition of FOO

It's the presence of that token (something, here) with a location of line 1 that confuses Wave long enough to report __LINE___ as 1. It comes off the pending queue and briefly sets the global position accordingly.

jefftrull commented 4 years ago

I've been studying the code for a while and had something of an epiphany today. The place where the act_pos value is set briefly to the "wrong" value (the location of the actual __LINE__ instead of its expansion point) is somewhat unusual in that both it and act_token are updated at the same time. act_token is set at many points, but act_pos in only two. And if I remove the "problem" location, leaving only one, this case behaves correctly and so do all the unit tests.

In short, how would you feel about deleting this line?

jefftrull commented 4 years ago

I may have spoken too soon. The new output from Wave, given this change, is 3 2 which is closer but still wrong.

hkaiser commented 4 years ago

In short, how would you feel about deleting this line?

Do it! ;-)