Quuxplusone / LLVMBugzillaTest

0 stars 0 forks source link

DOS line endings (\r\n) result in incorrect output with clang-format #34495

Closed Quuxplusone closed 6 years ago

Quuxplusone commented 6 years ago
Bugzilla Link PR35522
Status RESOLVED FIXED
Importance P release blocker
Reported by Erik Lacharite (erik.lacharite@gmail.com)
Reported on 2017-12-04 13:21:25 -0800
Last modified on 2017-12-07 00:51:45 -0800
Version 5.0
Hardware PC Linux
CC djasper@google.com, klimek@google.com, llvm-bugs@lists.llvm.org, sammccall@google.com
Fixed by commit(s)
Attachments reproducable-line-ending-bug-for-LLVM.zip (18612 bytes, application/x-zip-compressed)
Blocks
Blocked by
See also
Created attachment 19518
Sample files and output

clang-format produces an incorrect output if DOS line endings (\r\n) are used.
See screenshot in attached in .zip to preview the output.

#### UNIX LINE ENDINGS RESULTS #####

#define YYDPRINTF(Args)          \
  do {                           \
    if (yydebug) YYFPRINTF Args; \
  } while (YYID(0))

#### DOS LINE ENDINGS RESULTS ####

#define YYDPRINTF(Args)          \
  \
do {                             \

    if (yydebug) YYFPRINTF Args; \
  \
}                             \
  while (YYID(0))

###########################

To reproduce:
1. Download both attached sample files in .zip
2. Run each with "clang-format -i -style=google <file>"

may relate to:
Bug 33124 - ARCMT does not handle \r\n correctly
https://bugs.llvm.org/show_bug.cgi?id=33124
Quuxplusone commented 6 years ago

Attached reproducable-line-ending-bug-for-LLVM.zip (18612 bytes, application/x-zip-compressed): Sample files and output

Quuxplusone commented 6 years ago
Seems to work correctly running on linux, so maybe we have some platform-
dependent code.

$ hexdump -C ~/test.c
00000000  23 64 65 66 69 6e 65 20  59 59 44 50 52 49 4e 54  |#define YYDPRINT|
00000010  46 28 41 72 67 73 29 20  20 20 20 20 20 20 20 20  |F(Args)         |
00000020  20 5c 0d 0a 20 20 64 6f  20 7b 20 20 20 20 20 20  | \..  do {      |
00000030  20 20 20 20 20 20 20 20  20 20 20 20 20 20 20 20  |                |
00000040  20 20 20 20 20 5c 0d 0a  20 20 20 20 69 66 20 28  |     \..    if (|
00000050  79 79 64 65 62 75 67 29  20 59 59 46 50 52 49 4e  |yydebug) YYFPRIN|
00000060  54 46 20 41 72 67 73 3b  20 5c 0d 0a 20 20 7d 20  |TF Args; \..  } |
00000070  77 68 69 6c 65 20 28 59  59 49 44 28 30 29 29 0d  |while (YYID(0)).|
00000080  0a 0d 0a                                          |...|
00000083
$ clang-format ~/test.c | hexdump -C
00000000  23 64 65 66 69 6e 65 20  59 59 44 50 52 49 4e 54  |#define YYDPRINT|
00000010  46 28 41 72 67 73 29 20  20 20 20 20 20 20 20 20  |F(Args)         |
00000020  20 5c 0d 0a 20 20 64 6f  20 7b 20 20 20 20 20 20  | \..  do {      |
00000030  20 20 20 20 20 20 20 20  20 20 20 20 20 20 20 20  |                |
00000040  20 20 20 20 20 5c 0d 0a  20 20 20 20 69 66 20 28  |     \..    if (|
00000050  79 79 64 65 62 75 67 29  20 59 59 46 50 52 49 4e  |yydebug) YYFPRIN|
00000060  54 46 20 41 72 67 73 3b  20 5c 0d 0a 20 20 7d 20  |TF Args; \..  } |
00000070  77 68 69 6c 65 20 28 59  59 49 44 28 30 29 29 0d  |while (YYID(0)).|
00000080  0a                                                |.|
00000081

I think I have a windows VM around, will try there.
Quuxplusone commented 6 years ago
Actually, your screenshot shows what seems like correct behavior to me.
Why do you say the output is incorrect?

clang-format respects the line-endings of the input file, and produces output
with the same line-ending style. Therefore if you run it over a DOS file,
you'll get a DOS file as output.

I'm pretty sure this is the right default:
 - clang-format behaves the same way on different platforms
 - clang-format never messes up your platform's line endings
but if you want clang-format to normalize line endings, a patch to make this a
customizable style attribute seems reasonable.
Quuxplusone commented 6 years ago
This issue is reproducible on Linux with DOS line endings (no need for a
Windows VM, see below). Just to clear, the screenshot was not  a before and
after, rather outlining the different output between DOS and UNIX line endings.
Running clang-format over the DOS file results in additional backslashes '\'.

[Desktop]$ clang-format -style=google scriptparser-sample-\(DOS_CRLF\).cpp

#define YYDPRINTF(Args)          \
  \
do {                             \
    if (yydebug) YYFPRINTF Args; \
  \
}                             \
  while (YYID(0))

[Desktop]$ clang-format -style=google scriptparser-sample-\(UNIX_LF\).cpp

#define YYDPRINTF(Args)          \
  do {                           \
    if (yydebug) YYFPRINTF Args; \
  } while (YYID(0))
[lumerical@2018B-LIN-MAST Desktop]$
Quuxplusone commented 6 years ago
Sorry, for some reason I fixated on the line endings and missed the extra
backslashes.
I can't reproduce this at HEAD, but I can indeed with the 5.0 release. So this
will be fixed in the next major release (6.0, in a few months).
I can ask about getting this into a dot release, but I think we've just missed
the window for 5.0.1.
Quuxplusone commented 6 years ago

Is there any chance that the fix is already in the dot release? Regardless I think this issue can be resolved. Thanks for your help!

Quuxplusone commented 6 years ago

Sadly not. The fix was https://reviews.llvm.org/rC316910

The 5.0.1 RC2 source has the old/broken version: http://llvm.org/svn/llvm-project/cfe/tags/RELEASE_501/rc2/lib/Format/FormatTokenLexer.cpp

Given another dot release isn't that likely, there's indeed not much to do here but wait. Thanks for reporting!