clang-format misformats struct pointer member refs after objc method calls

Quuxplusone commented 3 years ago


Bugzilla Link	PR47907
Status	NEW
Importance	P enhancement
Reported by	Elly Fong-Jones (ellyjones@chromium.org)
Reported on	2020-10-19 14:26:11 -0700
Last modified on	2020-10-21 13:15:30 -0700
Version	unspecified
Hardware	PC All
CC	bhamiltoncx@gmail.com, djasper@google.com, klimek@google.com, llvm-bugs@lists.llvm.org, nicolasweber@gmx.de
Fixed by commit(s)
Attachments	`test.mm` (42 bytes, text/plain)
Blocks
Blocked by
See also

Created attachment 24078
repro case

When formatting the following code snippet:

  int main() {
    foo([obj method], a->b);
  }

clang-format incorrectly inserts spaces around the arrow:

$ cat test.mm
int main() {
  foo([obj method], a->b);
}

$ clang-format --version
clang-format version 11.0.0 (https://github.com/llvm/llvm-project
eb85e90350e93a64279139e7eca9ca40c8fbf5eb)
$ clang-format test.mm
int main() {
  foo([obj method], a -> b);
}

The test file is attached as well.

Quuxplusone commented 3 years ago

Attached test.mm (42 bytes, text/plain): repro case

Quuxplusone commented 3 years ago

Passing -debug and comparing to int main() { foo(obj, a->b); } we get

M=0 C=0 T=Unknown S=0 F=0 B=0 BK=0 P=43 Name=identifier L=13 PPK=2 FakeLParens= FakeRParens=2 II=0xa73768 Text='b'

here but

M=0 C=1 T=LambdaArrow S=1 F=0 B=0 BK=0 P=150 Name=arrow L=22 PPK=2 FakeLParens= FakeRParens=0 II=0x0 Text='->'

in the simpler thing.

[...], a->b does look a bit like a lambda.

Looks like TT_LambdaArrow is assigned in UnwrappedLineParser::tryToParseLambda() in clang/lib/Format/UnwrappedLineParser.cpp.

That function basically skips a long list of tokens until it hits a {. Maybe it should keep track of if it's currently in a [...], a <...>, a (....), and if it sees unexpected tokens outside these parens, bail out.

There's also

case tok::arrow:
  // This might or might not actually be a lambda arrow (this could be an
  // ObjC method invocation followed by a dereferencing arrow). We might
  // reset this back to TT_Unknown in TokenAnnotator.
  FormatTok->setType(TT_LambdaArrow);
  SeenArrow = true;
  nextToken();
  break;

which was added in https://reviews.llvm.org/D57923 but in this case we should be able to not set the wrong token type in the first place.

Ben, would you be interested in looking at this?

Quuxplusone commented 3 years ago

Yeah, my workaround in D57923 was clearly too specific and we need to re-do the parser for lambda arrows entirely.

Quuxplusone / LLVMBugzillaTest

clang-format misformats struct pointer member refs after objc method calls #46876