banach-space / clang-tutor

A collection of out-of-tree Clang plugins for teaching and learning
The Unlicense
692 stars 62 forks source link

fix check against main translation unit location in CodeStyleChecker #7

Closed ivafanas closed 3 years ago

ivafanas commented 3 years ago

Seems like expansion location ("presumed" location) should be used instead of spelling location for the check against main translation unit in CodeStyleChecker for macro expansion scenario.

I'm not 100% sure, could you please confirm.

banach-space commented 3 years ago

Thank you for submitting this!

From a quick glimpse this looks OK (and familiar!) This is a fairly deep part of Clang, so I would like to double-check. Let me get back to you over the weekend.

In the meantime, I updated my Github Actions to run on PR. I'm not sure how to trigger it for this branch. Perhaps you need to rebase?

banach-space commented 3 years ago

In general, this change makes sense to me. However, I'm not sure why it affects CodeStyleCheckerMacro.cpp? For CodeStyleCheckerMacro.cpp, the presumed and spelling locations should lead to the same FileID, shouldn't they? So it shouldn't matter which location is used. Any ideas?

ivafanas commented 3 years ago

I belive that reason is in Preprocessor::CreateString function which is called during macro expansion. https://clang.llvm.org/doxygen/Preprocessor_8cpp_source.html#l00494

It uses in-memory buffer to generate source code after macro expansion here (line 491):

SourceLocation Loc = ScratchBuf->getToken(Str.data(), Str.size(), DestPtr);

Implementation of ScratchBuf creation is here: https://clang.llvm.org/doxygen/ScratchBuffer_8cpp_source.html#l00067

Variable Loc is used as SpellingLoc argument for the final token location (Preprocessor.cpp lines 494-495):

    Loc = SourceMgr.createExpansionLoc(Loc, ExpansionLocStart,
                                       ExpansionLocEnd, Str.size());

See SourceManager::createExpansionLoc interface:

SourceLocation
SourceManager::createExpansionLoc(SourceLocation SpellingLoc,
                                  SourceLocation ExpansionLocStart,
                                  SourceLocation ExpansionLocEnd,
                                  unsigned TokLength,
                                  bool ExpansionIsTokenRange,
                                  int LoadedID,
                                  unsigned LoadedOffset)

So, token spelling location is an in-memory "scratch space" and expansion location is in the main translation unit.

I'm a novice in clang and have no idea is this behavior correct or not. Going to compile clang from sources to re-check this investigation on practice.

ivafanas commented 3 years ago

Grep by "scratch space" over clang codebase and implementation of SourceManager::isInMainFile (specifically SourceManager::getDecomposedExpansionLocSlowCase) shows that it is well-known behavior for clang engineers.

banach-space commented 3 years ago

Thank you for chasing this! And I really appreciate your detailed and comprehensive explanation. This is all very interesting stuff.

I was aware of the difference between the spelling and the expansion location, but didn't realize that there is also PresumedLoc. Initially, I thought that

expansion location == presumed location

Only after you replied and I started scanning Clang, I realized these are completely different. There is an incredibly helpful explanation in Create a working compiler with the LLVM framework, Part 2 (search for presumed location).

IIUC, PresumedLoc corresponds to the GNU line directive. If there's no line directive then indeed

 presumed location == expansion location

And as you explained above, a SourceLocation that corresponds to a macro spelling location lives in a scratch buffer. SM.getFileID(Decl->getLocation()) returns a FileID that corresponds to that buffer rather than the actual source file. That's why CodeStyleCheckerMacro.cpp was effectively failing.

Is this consistent with your understanding? Your patch basically makes sure that for macros the plugin uses presumed rather than spelling locations.

Would you mind adding a comment that explains why isInMainFile is used? Be as generous or sparse as you see fit - I just think our future selves will be thankful for any indication that it's there for a good reason.

ivafanas commented 3 years ago

Hi,

I've re-read explanation from IBM paper several times and think that this explanation is a simplification for reader understanding:

/// The expansion location is the line in the source code where the macro
/// was expanded (the return statement), the spelling location is the
/// location in the source where the macro was originally defined,
/// and the presumed location is where the line directive states that
/// the line is 17, or any other line.
  1. CodeStyleCheckerMacro.cpp test doesn't have line gcc directive so presumed location should not come into play.
  2. We've observed spelling location equal to "scratch space". Comment notes that spelling location should be in the main translation unit for the test.

I suppose that comment is a simplification and preprocessor behavior is a well-known issue for compiler engineers.

Comment on SourceManager::isInMainFile is added to CodeStyleChecker.h.

Ivan.