Quuxplusone / LLVMBugzillaTest

0 stars 0 forks source link

C18 standard conformance non-compliance. #41195

Open Quuxplusone opened 5 years ago

Quuxplusone commented 5 years ago
Bugzilla Link PR42226
Status CONFIRMED
Importance P enhancement
Reported by Juan Vargas (escanorexpress@gmail.com)
Reported on 2019-06-11 01:38:42 -0700
Last modified on 2019-06-11 23:06:23 -0700
Version 6.0
Hardware PC Linux
CC blitzrakete@gmail.com, dblaikie@gmail.com, dgregor@apple.com, erik.pilkington@gmail.com, hstong@ca.ibm.com, llvm-bugs@lists.llvm.org, richard-llvm@metafoo.co.uk, t.p.northover@gmail.com
Fixed by commit(s)
Attachments
Blocks
Blocked by
See also
This report is a technical point more than anything. Both the C90 and C18
standards have a "conformance chapter", chapter 4 on C18, that states the
following on section 1:

"In this International Standard, “shall” is to be interpreted as a requirement
on an implementation or on a program; conversely, “shall not” is to be
interpreted as a prohibition"

The technical point is the following, in section 5.1.1.2 in C18 , there is a
similar section in C90, phase 2 states the following:

"Each instance of a backslash character (\) immediately followed by a new-line
character is deleted, splicing physical source lines to form logical source
lines. Only the last backslash on any physical source line shall be eligible
for being part of such a splice. A source file that is not empty shall end in a
new-line character, which shall not be immediately preceded by a backslash
character before any such splicing takes place."

Consider the following program:

simple_case.c
"
int main(){ return 0; }
\
"

clang version 6.0.0-1ubuntu2 (tags/RELEASE_600/final)
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /usr/bin

gcc (Ubuntu 8.3.0-6ubuntu1~18.04) 8.3.0
Copyright (C) 2018 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

The clang version above compiles it just fine, and the gcc version above issues
the following warning:

"c.c:2:1: warning: backslash-newline at end of file
 \

"

If instead we have:

elaborate_case.c
"
int main(){ return 0; }
\

"

then they both compile fine without warning.

Question: according to the standard "A source file that is not empty shall end
in a new-line character, which shall not be immediately preceded by a backslash
character before any such splicing takes place.", are we not required by the
standard to treat such a text file, the elaborate_case.c,  as not a "source
file"? That is, it should not compile.
Quuxplusone commented 5 years ago

It would still be a source file officially (just by virtue of ending up as input to the compiler, I think) but the C++ standard would call it "ill-formed". C uses less sophisticated terminology.

Either way, we probably should diagnose it to be helpful but it's not a strict requirement (that "shall" is a requirement on the user, not the compiler). And we'd probably make it a warning by default (like GCC) rather than an error because it's pretty benign.

Quuxplusone commented 5 years ago
I would suggest actually inspecting the file to determine if there is a newline
character at the end of the file that is not preceded by backslash.

The following version of elaborate_case.c reproduces the behaviour you observed.

> od -A x -t x1 <elaborate_case.c
000000 69 6e 74 20 6d 61 69 6e 28 29 7b 20 72 65 74 75
000010 72 6e 20 30 3b 20 7d 0a 5c 0a 0a
00001b
Return:  0x00:0

Notice that there is a newline (0x0A) character not preceded by a backslash
(0x5C) at the end of the file.
Quuxplusone commented 5 years ago
I can reproduce this. Here's what I see: a source file ending in:

  * <backslash> produces an "expected unqualified-id" error (OK, but would be better if we produced an "unexpected backslash" or similar, but oh well);
  * <backslash> <newline> produces no diagnostic (should produce a warning / error as this bug suggests);
  * <newline> produces no diagnostic (good)
  * <anything other than backslash or newline> produces -Wnewline-eof "no newline at end of file"

The second case is an accepts-invalid bug.
Quuxplusone commented 5 years ago
@Tim: Per the standard,

Quote 1: "A source file that is not empty shall end in a new-line character,
which shall not be immediately preceded by a backslash character before any
such splicing takes place."

By construction of the standard, i.e. by the statement above, a source file
must have that property. If it does not, it is not a source file, as a source
file shall have that property.

Second point, that "shall" is in fact a requirement on the "implementation",
which chapter 5 of the standard defines as:

"An implementation translates C source files..."

Furthermore, the clause I cited(Quote1) above which we are not complying with
falls under "The precedence among the syntax rules of translation is specified
by the following phases:", hence it definitely is a "shall" that the
compiler/implementation enforces if we were to follow the standard.

As for making it a benign warning, as a matter of being pragmatic it makes
sense. However, if we are to ignore it in the implementation I wonder why it is
a requirement in the standard. If we are not enforcing it, why not remove it,
or relax the condition.

@hstong: I just checked:

00000000  69 6e 74 20 6d 61 69 6e  28 29 7b 20 72 65 74 75  |int main(){ retu|
00000010  72 6e 20 30 3b 7d 0a 5c  0a                       |rn 0;}.\.|
00000019

The above produces the warning in gcc, and compiles fine in clang, as I
described earlier.

00000000  69 6e 74 20 6d 61 69 6e  28 29 7b 20 72 65 74 75  |int main(){ retu|
00000010  72 6e 20 30 3b 7d 0a 5c  0a 0a                    |rn 0;}.\..|
0000001a

Compiles fine on both.

@Richard: thank you for checking.