carbon-language / carbon-lang

Carbon Language's main repository: documents, design, implementation, and related tools. (NOTE: Carbon Language is experimental; see README)
http://docs.carbon-lang.dev/
Other
32.34k stars 1.48k forks source link

Revisit the set of disallowed characters in a file type indicator #2140

Closed zygoloid closed 2 years ago

zygoloid commented 2 years ago

1360 changed multiline string literals from using """ to using ''' as delimiters, but it left this rule alone:

A file type indicator is any sequence of non-whitespace characters other than " or #.

This doesn't seem like an appropriate restriction any more. Maybe one of these options would be more useful:

1) Don't have any restriction. 2) Disallow ' and #, so that for example #'''# doesn't run onto the next line, in case it was intended to be some kind of raw character literal. 3) Disallow ' and #, but only at the start of the file type indicator. 4) Disallow ' throughout and # at the start. 5) Disallow anything that looks like the terminator (''' plus N hashes) from appearing within the string literal after the first character and before the first newline.

Some examples:

// Disallowed by status quo, allowed by all the above
var cpp20: String = '''c++ format="IndentWidth: 4"
''';
// Allowed by (1), (3), (5)
var haskell_prime: String = '''haskell'
x = 0
''';
// Allowed by (1), (3), (4), (5)
var c_sharp: String = '''c#
var n = 0;
''';
// Allowed by (1) and (3).
var looks_like_single_line_string = '''not_the_content''';
''';
// Allowed by (1) only.
var too_many_single_quotes = ''''
''';
// Allowed by (1) and (5).
var too_many_quotes_but_raw = #'''''
'''#;
// Allowed by (1) only.
var not_a_raw_character_literal = #'''#
'''#;
// Allowed by (1) and (5).
var still_not_a_raw_character_literal = #'''x'#
'''#;
chandlerc commented 2 years ago

I think I prefer #2. I think failing to switch from " to ' was an oversight and certainly not my intent.

I don't see any real advantages of having the less restrictive options here, and the simplicity of the rule in addition to the ability to catch a wide variety of possible typos or syntax confusions seems reasonable to motivate the restruction.

zygoloid commented 2 years ago

I'm slightly motivated to allow c# in particular, and slightly motivated to allow an arbitrary string to be represented somehow (maybe with escaping?), but not enough to argue for a change at this time. I think it's OK to wait and see how these filetype indicators are used in practice and if these restrictions are problematic. In the mean time, #2 seems like the right continuation of our previous decisions.

For reference, in GitHub-Flavored Markdown, an info string following a ``` fence cannot contain ` characters at all, but an info string following a ~~~ fence can contain any characters (other than newline), and "```c#" is valid in GFM to start a C#-formatted code block.

chandlerc commented 2 years ago

I'm a bit split on c# vs. csharp. The existence of spelled-out aliases in most existing syntax highlighting systems I'm aware of makes me suspect that the restriction is fine, but as you say, time will tell. I'm pretty happy just starting restrictive and seeing how it goes.

zygoloid commented 2 years ago

The leads have converged on option (2) for the time being. We'll keep an eye open for this decision causing issues (such as the c# issue) in practice.